# UI and CLI Interfaces for Duplicate Detection Indexing

This document describes the user interfaces (Web UI and CLI) for the optimized duplicate detection system.

## Web UI

### Index Management Page

**Location:** `/indexing`

**Features:**

1. **Index Destination Directory**
   - Select dataset from dropdown
   - Enter destination path
   - Configure batch size (default: 100)
   - Choose between:
     - **Index**: Add new files to the index
     - **Re-index**: Clear and rebuild the entire index

2. **Index Statistics**
   - View count of indexed files for selected dataset
   - Real-time updates after indexing operations

3. **Duplicate Statistics**
   - Total duplicate groups count
   - List of duplicate files with:
     - Dataset name
     - File count
     - File size
     - Hash preview
     - File paths
   - Shows up to 10 duplicate groups at a time

**Navigation:**

- Available in main navigation menu under "Indexing"
- Quick access from Duplicates page via "Manage Index" button

### Enhanced Duplicates Page

**Location:** `/duplicates`

**New Features:**

- **Manage Index** button for quick access to indexing page
- Duplicate scan now automatically uses database when available
- Faster scan times for indexed destinations

## CLI Commands

### Duplicate Detection Commands

#### Scan for Duplicates

```bash
watch-finished-cli duplicates:scan [options]
```

**Options:**

- `--reset`: Reset existing duplicate groups

**Example:**

```bash
watch-finished-cli duplicates:scan
watch-finished-cli duplicates:scan --reset
```

#### List Duplicate Groups

```bash
watch-finished-cli duplicates:list [options]
```

**Options:**

- `--status <status>`: Filter by status (pending/reviewed/purged)
- `--dataset <dataset>`: Filter by dataset

**Example:**

```bash
watch-finished-cli duplicates:list
watch-finished-cli duplicates:list --status pending --dataset movies
```

### Indexing Commands

#### Index Destination

```bash
watch-finished-cli index:destination --dataset <dataset> --destination <path> [options]
```

**Required:**

- `--dataset <dataset>`: Dataset name
- `--destination <path>`: Destination directory path

**Options:**

- `--reindex`: Clear and rebuild the index
- `--batch-size <size>`: Number of files to process at once (default: 100)

**Example:**

```bash
# Index a destination
watch-finished-cli index:destination \
  --dataset movies \
  --destination /media/movies

# Re-index (clear and rebuild)
watch-finished-cli index:destination \
  --dataset movies \
  --destination /media/movies \
  --reindex \
  --batch-size 200
```

#### View Duplicate Statistics

```bash
watch-finished-cli index:stats [options]
```

**Options:**

- `--dataset <dataset>`: Filter by dataset

**Example:**

```bash
watch-finished-cli index:stats
watch-finished-cli index:stats --dataset movies
```

#### Check Index Count

```bash
watch-finished-cli index:count --dataset <dataset> [options]
```

**Required:**

- `--dataset <dataset>`: Dataset name

**Options:**

- `--destination <path>`: Filter by destination path

**Example:**

```bash
watch-finished-cli index:count --dataset movies
watch-finished-cli index:count --dataset movies --destination /media/movies
```

#### Clear Index

```bash
watch-finished-cli index:clear --dataset <dataset> [options]
```

**Required:**

- `--dataset <dataset>`: Dataset name

**Options:**

- `--destination <path>`: Filter by destination path

**Example:**

```bash
watch-finished-cli index:clear --dataset movies
watch-finished-cli index:clear --dataset movies --destination /media/movies
```

## Workflow Examples

### Web UI Workflow

1. Navigate to **Indexing** page from main menu
2. Select a dataset (e.g., "movies")
3. Enter destination path (e.g., "/media/movies")
4. Click **Index** to start indexing
5. Wait for completion (progress shown via toast notifications)
6. View index statistics to verify
7. Navigate to **Duplicates** page
8. Click **Rescan** to detect duplicates (uses database)
9. Review and manage duplicates

### CLI Workflow

```bash
# 1. Index destination
watch-finished-cli index:destination \
  --dataset movies \
  --destination /media/movies

# Output: ✅ Indexed: 1234, Skipped: 5, Errors: 0

# 2. Check index count
watch-finished-cli index:count --dataset movies

# Output: 📈 Indexed files for movies: 1234

# 3. View duplicate statistics
watch-finished-cli index:stats --dataset movies

# Output: Shows duplicate groups with details

# 4. Scan for duplicates (uses database)
watch-finished-cli duplicates:scan

# Output: ✅ Scan complete

# 5. List duplicates
watch-finished-cli duplicates:list --dataset movies

# Output: Shows detailed list of duplicate groups
```

## Tips

### Web UI

- **Real-time Updates**: Statistics update immediately after indexing
- **Batch Size**: Adjust based on file size (larger batch for small files)
- **Dark Mode**: Fully supported for comfortable viewing
- **Responsive**: Works on desktop and tablet devices

### CLI

- **Colored Output**: Uses chalk for better readability
- **Progress Feedback**: Shows emojis and progress indicators
- **Error Handling**: Clear error messages with suggestions
- **Chaining**: Can be used in scripts for automation

### Best Practices

1. **Index First**: Always index destinations before scanning for duplicates
2. **Re-index Periodically**: Re-index when many files have been added
3. **Check Count**: Verify index count matches expected file count
4. **Monitor Stats**: Use stats command to track duplicate trends
5. **Automate**: Create scripts to index and scan on a schedule

## Troubleshooting

### Web UI

**Issue:** Index count is 0 after indexing

- **Solution:** Check destination path is correct
- **Solution:** Ensure files exist in the destination
- **Solution:** Check browser console for errors

**Issue:** Duplicates not showing after scan

- **Solution:** Index destinations first
- **Solution:** Click "Rescan" to refresh results
- **Solution:** Check if duplicates actually exist

### CLI

**Issue:** Command not found

- **Solution:** Run `pnpm install` in apps/cli directory
- **Solution:** Use full path: `node apps/cli/dist/index.js`

**Issue:** Connection error

- **Solution:** Verify service is running
- **Solution:** Check API_BASE environment variable
- **Solution:** Ensure correct port (default: 3000)

**Issue:** Slow indexing

- **Solution:** Increase batch size
- **Solution:** Run on server with fast disk I/O
- **Solution:** Index during off-peak hours

## Advanced Usage

### Scripting Example

```bash
#!/bin/bash
# Index all datasets

DATASETS=("movies" "tvshows" "music")
DESTINATIONS=(
  "/media/movies"
  "/media/tvshows"
  "/media/music"
)

for i in "${!DATASETS[@]}"; do
  dataset="${DATASETS[$i]}"
  destination="${DESTINATIONS[$i]}"

  echo "Indexing $dataset..."
  watch-finished-cli index:destination \
    --dataset "$dataset" \
    --destination "$destination" \
    --batch-size 150
done

echo "Running duplicate scan..."
watch-finished-cli duplicates:scan

echo "Getting duplicate stats..."
watch-finished-cli index:stats
```

### Automation with Cron

```cron
# Re-index daily at 2 AM
0 2 * * * /path/to/watch-finished-cli index:destination --dataset movies --destination /media/movies --reindex

# Scan for duplicates daily at 3 AM
0 3 * * * /path/to/watch-finished-cli duplicates:scan

# Weekly stats email
0 8 * * 1 /path/to/watch-finished-cli index:stats | mail -s "Weekly Duplicate Stats" admin@example.com
```