UI_AND_CLI_INTERFACES.md 7.3 KB

UI and CLI Interfaces for Duplicate Detection Indexing

This document describes the user interfaces (Web UI and CLI) for the optimized duplicate detection system.

Web UI

Index Management Page

Location: /indexing

Features:

  1. Index Destination Directory

    • Select dataset from dropdown
    • Enter destination path
    • Configure batch size (default: 100)
    • Choose between:
      • Index: Add new files to the index
      • Re-index: Clear and rebuild the entire index
  2. Index Statistics

    • View count of indexed files for selected dataset
    • Real-time updates after indexing operations
  3. Duplicate Statistics

    • Total duplicate groups count
    • List of duplicate files with:
      • Dataset name
      • File count
      • File size
      • Hash preview
      • File paths
    • Shows up to 10 duplicate groups at a time

Navigation:

  • Available in main navigation menu under "Indexing"
  • Quick access from Duplicates page via "Manage Index" button

Enhanced Duplicates Page

Location: /duplicates

New Features:

  • Manage Index button for quick access to indexing page
  • Duplicate scan now automatically uses database when available
  • Faster scan times for indexed destinations

CLI Commands

Duplicate Detection Commands

Scan for Duplicates

watch-finished-cli duplicates:scan [options]

Options:

  • --reset: Reset existing duplicate groups

Example:

watch-finished-cli duplicates:scan
watch-finished-cli duplicates:scan --reset

List Duplicate Groups

watch-finished-cli duplicates:list [options]

Options:

  • --status <status>: Filter by status (pending/reviewed/purged)
  • --dataset <dataset>: Filter by dataset

Example:

watch-finished-cli duplicates:list
watch-finished-cli duplicates:list --status pending --dataset movies

Indexing Commands

Index Destination

watch-finished-cli index:destination --dataset <dataset> --destination <path> [options]

Required:

  • --dataset <dataset>: Dataset name
  • --destination <path>: Destination directory path

Options:

  • --reindex: Clear and rebuild the index
  • --batch-size <size>: Number of files to process at once (default: 100)

Example:

# Index a destination
watch-finished-cli index:destination \
  --dataset movies \
  --destination /media/movies

# Re-index (clear and rebuild)
watch-finished-cli index:destination \
  --dataset movies \
  --destination /media/movies \
  --reindex \
  --batch-size 200

View Duplicate Statistics

watch-finished-cli index:stats [options]

Options:

  • --dataset <dataset>: Filter by dataset

Example:

watch-finished-cli index:stats
watch-finished-cli index:stats --dataset movies

Check Index Count

watch-finished-cli index:count --dataset <dataset> [options]

Required:

  • --dataset <dataset>: Dataset name

Options:

  • --destination <path>: Filter by destination path

Example:

watch-finished-cli index:count --dataset movies
watch-finished-cli index:count --dataset movies --destination /media/movies

Clear Index

watch-finished-cli index:clear --dataset <dataset> [options]

Required:

  • --dataset <dataset>: Dataset name

Options:

  • --destination <path>: Filter by destination path

Example:

watch-finished-cli index:clear --dataset movies
watch-finished-cli index:clear --dataset movies --destination /media/movies

Workflow Examples

Web UI Workflow

  1. Navigate to Indexing page from main menu
  2. Select a dataset (e.g., "movies")
  3. Enter destination path (e.g., "/media/movies")
  4. Click Index to start indexing
  5. Wait for completion (progress shown via toast notifications)
  6. View index statistics to verify
  7. Navigate to Duplicates page
  8. Click Rescan to detect duplicates (uses database)
  9. Review and manage duplicates

CLI Workflow

# 1. Index destination
watch-finished-cli index:destination \
  --dataset movies \
  --destination /media/movies

# Output: ✅ Indexed: 1234, Skipped: 5, Errors: 0

# 2. Check index count
watch-finished-cli index:count --dataset movies

# Output: 📈 Indexed files for movies: 1234

# 3. View duplicate statistics
watch-finished-cli index:stats --dataset movies

# Output: Shows duplicate groups with details

# 4. Scan for duplicates (uses database)
watch-finished-cli duplicates:scan

# Output: ✅ Scan complete

# 5. List duplicates
watch-finished-cli duplicates:list --dataset movies

# Output: Shows detailed list of duplicate groups

Tips

Web UI

  • Real-time Updates: Statistics update immediately after indexing
  • Batch Size: Adjust based on file size (larger batch for small files)
  • Dark Mode: Fully supported for comfortable viewing
  • Responsive: Works on desktop and tablet devices

CLI

  • Colored Output: Uses chalk for better readability
  • Progress Feedback: Shows emojis and progress indicators
  • Error Handling: Clear error messages with suggestions
  • Chaining: Can be used in scripts for automation

Best Practices

  1. Index First: Always index destinations before scanning for duplicates
  2. Re-index Periodically: Re-index when many files have been added
  3. Check Count: Verify index count matches expected file count
  4. Monitor Stats: Use stats command to track duplicate trends
  5. Automate: Create scripts to index and scan on a schedule

Troubleshooting

Web UI

Issue: Index count is 0 after indexing

  • Solution: Check destination path is correct
  • Solution: Ensure files exist in the destination
  • Solution: Check browser console for errors

Issue: Duplicates not showing after scan

  • Solution: Index destinations first
  • Solution: Click "Rescan" to refresh results
  • Solution: Check if duplicates actually exist

CLI

Issue: Command not found

  • Solution: Run pnpm install in apps/cli directory
  • Solution: Use full path: node apps/cli/dist/index.js

Issue: Connection error

  • Solution: Verify service is running
  • Solution: Check API_BASE environment variable
  • Solution: Ensure correct port (default: 3000)

Issue: Slow indexing

  • Solution: Increase batch size
  • Solution: Run on server with fast disk I/O
  • Solution: Index during off-peak hours

Advanced Usage

Scripting Example

#!/bin/bash
# Index all datasets

DATASETS=("movies" "tvshows" "music")
DESTINATIONS=(
  "/media/movies"
  "/media/tvshows"
  "/media/music"
)

for i in "${!DATASETS[@]}"; do
  dataset="${DATASETS[$i]}"
  destination="${DESTINATIONS[$i]}"

  echo "Indexing $dataset..."
  watch-finished-cli index:destination \
    --dataset "$dataset" \
    --destination "$destination" \
    --batch-size 150
done

echo "Running duplicate scan..."
watch-finished-cli duplicates:scan

echo "Getting duplicate stats..."
watch-finished-cli index:stats

Automation with Cron

# Re-index daily at 2 AM
0 2 * * * /path/to/watch-finished-cli index:destination --dataset movies --destination /media/movies --reindex

# Scan for duplicates daily at 3 AM
0 3 * * * /path/to/watch-finished-cli duplicates:scan

# Weekly stats email
0 8 * * 1 /path/to/watch-finished-cli index:stats | mail -s "Weekly Duplicate Stats" admin@example.com