# UI and CLI Interfaces for Duplicate Detection Indexing This document describes the user interfaces (Web UI and CLI) for the optimized duplicate detection system. ## Web UI ### Index Management Page **Location:** `/indexing` **Features:** 1. **Index Destination Directory** - Select dataset from dropdown - Enter destination path - Configure batch size (default: 100) - Choose between: - **Index**: Add new files to the index - **Re-index**: Clear and rebuild the entire index 2. **Index Statistics** - View count of indexed files for selected dataset - Real-time updates after indexing operations 3. **Duplicate Statistics** - Total duplicate groups count - List of duplicate files with: - Dataset name - File count - File size - Hash preview - File paths - Shows up to 10 duplicate groups at a time **Navigation:** - Available in main navigation menu under "Indexing" - Quick access from Duplicates page via "Manage Index" button ### Enhanced Duplicates Page **Location:** `/duplicates` **New Features:** - **Manage Index** button for quick access to indexing page - Duplicate scan now automatically uses database when available - Faster scan times for indexed destinations ## CLI Commands ### Duplicate Detection Commands #### Scan for Duplicates ```bash watch-finished-cli duplicates:scan [options] ``` **Options:** - `--reset`: Reset existing duplicate groups **Example:** ```bash watch-finished-cli duplicates:scan watch-finished-cli duplicates:scan --reset ``` #### List Duplicate Groups ```bash watch-finished-cli duplicates:list [options] ``` **Options:** - `--status `: Filter by status (pending/reviewed/purged) - `--dataset `: Filter by dataset **Example:** ```bash watch-finished-cli duplicates:list watch-finished-cli duplicates:list --status pending --dataset movies ``` ### Indexing Commands #### Index Destination ```bash watch-finished-cli index:destination --dataset --destination [options] ``` **Required:** - `--dataset `: Dataset name - `--destination `: Destination directory path **Options:** - `--reindex`: Clear and rebuild the index - `--batch-size `: Number of files to process at once (default: 100) **Example:** ```bash # Index a destination watch-finished-cli index:destination \ --dataset movies \ --destination /media/movies # Re-index (clear and rebuild) watch-finished-cli index:destination \ --dataset movies \ --destination /media/movies \ --reindex \ --batch-size 200 ``` #### View Duplicate Statistics ```bash watch-finished-cli index:stats [options] ``` **Options:** - `--dataset `: Filter by dataset **Example:** ```bash watch-finished-cli index:stats watch-finished-cli index:stats --dataset movies ``` #### Check Index Count ```bash watch-finished-cli index:count --dataset [options] ``` **Required:** - `--dataset `: Dataset name **Options:** - `--destination `: Filter by destination path **Example:** ```bash watch-finished-cli index:count --dataset movies watch-finished-cli index:count --dataset movies --destination /media/movies ``` #### Clear Index ```bash watch-finished-cli index:clear --dataset [options] ``` **Required:** - `--dataset `: Dataset name **Options:** - `--destination `: Filter by destination path **Example:** ```bash watch-finished-cli index:clear --dataset movies watch-finished-cli index:clear --dataset movies --destination /media/movies ``` ## Workflow Examples ### Web UI Workflow 1. Navigate to **Indexing** page from main menu 2. Select a dataset (e.g., "movies") 3. Enter destination path (e.g., "/media/movies") 4. Click **Index** to start indexing 5. Wait for completion (progress shown via toast notifications) 6. View index statistics to verify 7. Navigate to **Duplicates** page 8. Click **Rescan** to detect duplicates (uses database) 9. Review and manage duplicates ### CLI Workflow ```bash # 1. Index destination watch-finished-cli index:destination \ --dataset movies \ --destination /media/movies # Output: ✅ Indexed: 1234, Skipped: 5, Errors: 0 # 2. Check index count watch-finished-cli index:count --dataset movies # Output: 📈 Indexed files for movies: 1234 # 3. View duplicate statistics watch-finished-cli index:stats --dataset movies # Output: Shows duplicate groups with details # 4. Scan for duplicates (uses database) watch-finished-cli duplicates:scan # Output: ✅ Scan complete # 5. List duplicates watch-finished-cli duplicates:list --dataset movies # Output: Shows detailed list of duplicate groups ``` ## Tips ### Web UI - **Real-time Updates**: Statistics update immediately after indexing - **Batch Size**: Adjust based on file size (larger batch for small files) - **Dark Mode**: Fully supported for comfortable viewing - **Responsive**: Works on desktop and tablet devices ### CLI - **Colored Output**: Uses chalk for better readability - **Progress Feedback**: Shows emojis and progress indicators - **Error Handling**: Clear error messages with suggestions - **Chaining**: Can be used in scripts for automation ### Best Practices 1. **Index First**: Always index destinations before scanning for duplicates 2. **Re-index Periodically**: Re-index when many files have been added 3. **Check Count**: Verify index count matches expected file count 4. **Monitor Stats**: Use stats command to track duplicate trends 5. **Automate**: Create scripts to index and scan on a schedule ## Troubleshooting ### Web UI **Issue:** Index count is 0 after indexing - **Solution:** Check destination path is correct - **Solution:** Ensure files exist in the destination - **Solution:** Check browser console for errors **Issue:** Duplicates not showing after scan - **Solution:** Index destinations first - **Solution:** Click "Rescan" to refresh results - **Solution:** Check if duplicates actually exist ### CLI **Issue:** Command not found - **Solution:** Run `pnpm install` in apps/cli directory - **Solution:** Use full path: `node apps/cli/dist/index.js` **Issue:** Connection error - **Solution:** Verify service is running - **Solution:** Check API_BASE environment variable - **Solution:** Ensure correct port (default: 3000) **Issue:** Slow indexing - **Solution:** Increase batch size - **Solution:** Run on server with fast disk I/O - **Solution:** Index during off-peak hours ## Advanced Usage ### Scripting Example ```bash #!/bin/bash # Index all datasets DATASETS=("movies" "tvshows" "music") DESTINATIONS=( "/media/movies" "/media/tvshows" "/media/music" ) for i in "${!DATASETS[@]}"; do dataset="${DATASETS[$i]}" destination="${DESTINATIONS[$i]}" echo "Indexing $dataset..." watch-finished-cli index:destination \ --dataset "$dataset" \ --destination "$destination" \ --batch-size 150 done echo "Running duplicate scan..." watch-finished-cli duplicates:scan echo "Getting duplicate stats..." watch-finished-cli index:stats ``` ### Automation with Cron ```cron # Re-index daily at 2 AM 0 2 * * * /path/to/watch-finished-cli index:destination --dataset movies --destination /media/movies --reindex # Scan for duplicates daily at 3 AM 0 3 * * * /path/to/watch-finished-cli duplicates:scan # Weekly stats email 0 8 * * 1 /path/to/watch-finished-cli index:stats | mail -s "Weekly Duplicate Stats" admin@example.com ```