# GPU Support for Talk2Me ## Current GPU Support Status ### ✅ NVIDIA GPUs (Full Support) - **Requirements**: CUDA 11.x or 12.x - **Optimizations**: - TensorFloat-32 (TF32) for Ampere GPUs (RTX 30xx, A100) - cuDNN auto-tuning - Half-precision (FP16) inference - CUDA kernel pre-caching - Memory pre-allocation ### ⚠️ AMD GPUs (Limited Support) - **Requirements**: ROCm 5.x installation - **Status**: Falls back to CPU unless ROCm is properly configured - **To enable AMD GPU**: ```bash # Install PyTorch with ROCm support pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.6 ``` - **Limitations**: - No cuDNN optimizations - May have compatibility issues - Performance varies by GPU model ### ✅ Apple Silicon (M1/M2/M3) - **Requirements**: macOS 12.3+ - **Status**: Uses Metal Performance Shaders (MPS) - **Optimizations**: - Native Metal acceleration - Unified memory architecture benefits - No FP16 (not well supported on MPS yet) ### 📊 Performance Comparison | GPU Type | First Transcription | Subsequent | Notes | |----------|-------------------|------------|-------| | NVIDIA RTX 3080 | ~2s | ~0.5s | Full optimizations | | AMD RX 6800 XT | ~3-4s | ~1-2s | With ROCm | | Apple M2 | ~2.5s | ~1s | MPS acceleration | | CPU (i7-12700K) | ~5-10s | ~5-10s | No acceleration | ## Checking Your GPU Status Run the app and check the logs: ``` INFO: NVIDIA GPU detected - using CUDA acceleration INFO: GPU memory allocated: 542.00 MB INFO: Whisper model loaded and optimized for NVIDIA GPU ``` ## Troubleshooting ### AMD GPU Not Detected 1. Install ROCm-compatible PyTorch 2. Set environment variable: `export HSA_OVERRIDE_GFX_VERSION=10.3.0` 3. Check with: `rocm-smi` ### NVIDIA GPU Not Used 1. Check CUDA installation: `nvidia-smi` 2. Verify PyTorch CUDA: `python -c "import torch; print(torch.cuda.is_available())"` 3. Install CUDA toolkit if needed ### Apple Silicon Not Accelerated 1. Update macOS to 12.3+ 2. Update PyTorch: `pip install --upgrade torch` 3. Check MPS: `python -c "import torch; print(torch.backends.mps.is_available())"`