talk2me/GPU_SUPPORT.md

# GPU Support for Talk2Me

## Current GPU Support Status

### ✅ NVIDIA GPUs (Full Support)
- **Requirements**: CUDA 11.x or 12.x
- **Optimizations**:
  - TensorFloat-32 (TF32) for Ampere GPUs (RTX 30xx, A100)
  - cuDNN auto-tuning
  - Half-precision (FP16) inference
  - CUDA kernel pre-caching
  - Memory pre-allocation

### ⚠️ AMD GPUs (Limited Support)
- **Requirements**: ROCm 5.x installation
- **Status**: Falls back to CPU unless ROCm is properly configured
- **To enable AMD GPU**:
  ```bash
  # Install PyTorch with ROCm support
  pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.6
  ```
- **Limitations**:
  - No cuDNN optimizations
  - May have compatibility issues
  - Performance varies by GPU model

### ✅ Apple Silicon (M1/M2/M3)
- **Requirements**: macOS 12.3+
- **Status**: Uses Metal Performance Shaders (MPS)
- **Optimizations**:
  - Native Metal acceleration
  - Unified memory architecture benefits
  - No FP16 (not well supported on MPS yet)

### 📊 Performance Comparison

| GPU Type | First Transcription | Subsequent | Notes |
|----------|-------------------|------------|-------|
| NVIDIA RTX 3080 | ~2s | ~0.5s | Full optimizations |
| AMD RX 6800 XT | ~3-4s | ~1-2s | With ROCm |
| Apple M2 | ~2.5s | ~1s | MPS acceleration |
| CPU (i7-12700K) | ~5-10s | ~5-10s | No acceleration |

## Checking Your GPU Status

Run the app and check the logs:
```
INFO: NVIDIA GPU detected - using CUDA acceleration
INFO: GPU memory allocated: 542.00 MB
INFO: Whisper model loaded and optimized for NVIDIA GPU
```

## Troubleshooting

### AMD GPU Not Detected
1. Install ROCm-compatible PyTorch
2. Set environment variable: `export HSA_OVERRIDE_GFX_VERSION=10.3.0`
3. Check with: `rocm-smi`

### NVIDIA GPU Not Used
1. Check CUDA installation: `nvidia-smi`
2. Verify PyTorch CUDA: `python -c "import torch; print(torch.cuda.is_available())"`
3. Install CUDA toolkit if needed

### Apple Silicon Not Accelerated
1. Update macOS to 12.3+
2. Update PyTorch: `pip install --upgrade torch`
3. Check MPS: `python -c "import torch; print(torch.backends.mps.is_available())"`