UI and CLI Interfaces for Duplicate Detection Indexing

This document describes the user interfaces (Web UI and CLI) for the optimized duplicate detection system.

Web UI

Index Management Page

Location: /indexing

Features:

Index Destination Directory
- Select dataset from dropdown
- Enter destination path
- Configure batch size (default: 100)
- Choose between:
  - Index: Add new files to the index
  - Re-index: Clear and rebuild the entire index
Index Statistics
- View count of indexed files for selected dataset
- Real-time updates after indexing operations
Duplicate Statistics
- Total duplicate groups count
- List of duplicate files with:
  - Dataset name
  - File count
  - File size
  - Hash preview
  - File paths
- Shows up to 10 duplicate groups at a time

Navigation:

Available in main navigation menu under "Indexing"
Quick access from Duplicates page via "Manage Index" button

Enhanced Duplicates Page

Location: /duplicates

New Features:

Manage Index button for quick access to indexing page
Duplicate scan now automatically uses database when available
Faster scan times for indexed destinations

CLI Commands

Duplicate Detection Commands

Scan for Duplicates

watch-finished-cli duplicates:scan [options]

Options:

--reset: Reset existing duplicate groups

Example:

watch-finished-cli duplicates:scan
watch-finished-cli duplicates:scan --reset

List Duplicate Groups

watch-finished-cli duplicates:list [options]

Options:

--status <status>: Filter by status (pending/reviewed/purged)
--dataset <dataset>: Filter by dataset

Example:

watch-finished-cli duplicates:list
watch-finished-cli duplicates:list --status pending --dataset movies

Indexing Commands

Index Destination

watch-finished-cli index:destination --dataset <dataset> --destination <path> [options]

Required:

--dataset <dataset>: Dataset name
--destination <path>: Destination directory path

Options:

--reindex: Clear and rebuild the index
--batch-size <size>: Number of files to process at once (default: 100)

Example:

# Index a destination
watch-finished-cli index:destination \
  --dataset movies \
  --destination /media/movies

# Re-index (clear and rebuild)
watch-finished-cli index:destination \
  --dataset movies \
  --destination /media/movies \
  --reindex \
  --batch-size 200

View Duplicate Statistics

watch-finished-cli index:stats [options]

Options:

--dataset <dataset>: Filter by dataset

Example:

watch-finished-cli index:stats
watch-finished-cli index:stats --dataset movies

Check Index Count

watch-finished-cli index:count --dataset <dataset> [options]

Required:

--dataset <dataset>: Dataset name

Options:

--destination <path>: Filter by destination path

Example:

watch-finished-cli index:count --dataset movies
watch-finished-cli index:count --dataset movies --destination /media/movies

Clear Index

watch-finished-cli index:clear --dataset <dataset> [options]

Required:

--dataset <dataset>: Dataset name

Options:

--destination <path>: Filter by destination path

Example:

watch-finished-cli index:clear --dataset movies
watch-finished-cli index:clear --dataset movies --destination /media/movies

Workflow Examples

Web UI Workflow

Navigate to Indexing page from main menu
Select a dataset (e.g., "movies")
Enter destination path (e.g., "/media/movies")
Click Index to start indexing
Wait for completion (progress shown via toast notifications)
View index statistics to verify
Navigate to Duplicates page
Click Rescan to detect duplicates (uses database)
Review and manage duplicates

CLI Workflow

# 1. Index destination
watch-finished-cli index:destination \
  --dataset movies \
  --destination /media/movies

# Output: ✅ Indexed: 1234, Skipped: 5, Errors: 0

# 2. Check index count
watch-finished-cli index:count --dataset movies

# Output: 📈 Indexed files for movies: 1234

# 3. View duplicate statistics
watch-finished-cli index:stats --dataset movies

# Output: Shows duplicate groups with details

# 4. Scan for duplicates (uses database)
watch-finished-cli duplicates:scan

# Output: ✅ Scan complete

# 5. List duplicates
watch-finished-cli duplicates:list --dataset movies

# Output: Shows detailed list of duplicate groups

Tips

Web UI

Real-time Updates: Statistics update immediately after indexing
Batch Size: Adjust based on file size (larger batch for small files)
Dark Mode: Fully supported for comfortable viewing
Responsive: Works on desktop and tablet devices

CLI

Colored Output: Uses chalk for better readability
Progress Feedback: Shows emojis and progress indicators
Error Handling: Clear error messages with suggestions
Chaining: Can be used in scripts for automation

Best Practices

Index First: Always index destinations before scanning for duplicates
Re-index Periodically: Re-index when many files have been added
Check Count: Verify index count matches expected file count
Monitor Stats: Use stats command to track duplicate trends
Automate: Create scripts to index and scan on a schedule

Troubleshooting

Web UI

Issue: Index count is 0 after indexing

Solution: Check destination path is correct
Solution: Ensure files exist in the destination
Solution: Check browser console for errors

Issue: Duplicates not showing after scan

Solution: Index destinations first
Solution: Click "Rescan" to refresh results
Solution: Check if duplicates actually exist

CLI

Issue: Command not found

Solution: Run pnpm install in apps/cli directory
Solution: Use full path: node apps/cli/dist/index.js

Issue: Connection error

Solution: Verify service is running
Solution: Check API_BASE environment variable
Solution: Ensure correct port (default: 3000)

Issue: Slow indexing

Solution: Increase batch size
Solution: Run on server with fast disk I/O
Solution: Index during off-peak hours

Advanced Usage

Scripting Example

#!/bin/bash
# Index all datasets

DATASETS=("movies" "tvshows" "music")
DESTINATIONS=(
  "/media/movies"
  "/media/tvshows"
  "/media/music"
)

for i in "${!DATASETS[@]}"; do
  dataset="${DATASETS[$i]}"
  destination="${DESTINATIONS[$i]}"

  echo "Indexing $dataset..."
  watch-finished-cli index:destination \
    --dataset "$dataset" \
    --destination "$destination" \
    --batch-size 150
done

echo "Running duplicate scan..."
watch-finished-cli duplicates:scan

echo "Getting duplicate stats..."
watch-finished-cli index:stats

Automation with Cron

# Re-index daily at 2 AM
0 2 * * * /path/to/watch-finished-cli index:destination --dataset movies --destination /media/movies --reindex

# Scan for duplicates daily at 3 AM
0 3 * * * /path/to/watch-finished-cli duplicates:scan

# Weekly stats email
0 8 * * 1 /path/to/watch-finished-cli index:stats | mail -s "Weekly Duplicate Stats" admin@example.com

UI_AND_CLI_INTERFACES.md 7.3 KB 永久連結 文件歷史 原始文件

UI and CLI Interfaces for Duplicate Detection Indexing

Web UI

Index Management Page

Enhanced Duplicates Page

CLI Commands

Duplicate Detection Commands

Scan for Duplicates

List Duplicate Groups

Indexing Commands

Index Destination

View Duplicate Statistics

Check Index Count

Clear Index

Workflow Examples

Web UI Workflow

CLI Workflow

Tips

Web UI

CLI

Best Practices

Troubleshooting

Web UI

CLI

Advanced Usage

Scripting Example

Automation with Cron

UI_AND_CLI_INTERFACES.md 7.3 KB

永久連結文件歷史原始文件