# Memory Management Documentation This document describes the comprehensive memory management system implemented in Talk2Me to prevent memory leaks and crashes after extended use. ## Overview Talk2Me implements a dual-layer memory management system: 1. **Backend (Python)**: Manages GPU memory, Whisper model, and temporary files 2. **Frontend (JavaScript)**: Manages audio blobs, object URLs, and Web Audio contexts ## Memory Leak Issues Addressed ### Backend Memory Leaks 1. **GPU Memory Fragmentation** - Whisper model accumulates GPU memory over time - Solution: Periodic GPU cache clearing and model reloading 2. **Temporary File Accumulation** - Audio files not cleaned up quickly enough under load - Solution: Aggressive cleanup with tracking and periodic sweeps 3. **Session Resource Leaks** - Long-lived sessions accumulate resources - Solution: Integration with session manager for resource limits ### Frontend Memory Leaks 1. **Audio Blob Leaks** - MediaRecorder chunks kept in memory - Solution: SafeMediaRecorder wrapper with automatic cleanup 2. **Object URL Leaks** - URLs created but not revoked - Solution: Centralized tracking and automatic revocation 3. **AudioContext Leaks** - Contexts created but never closed - Solution: MemoryManager tracks and closes contexts 4. **MediaStream Leaks** - Microphone streams not properly stopped - Solution: Automatic track stopping and stream cleanup ## Backend Memory Management ### MemoryManager Class The `MemoryManager` monitors and manages memory usage: ```python memory_manager = MemoryManager(app, { 'memory_threshold_mb': 4096, # 4GB process memory limit 'gpu_memory_threshold_mb': 2048, # 2GB GPU memory limit 'cleanup_interval': 30 # Check every 30 seconds }) ``` ### Features 1. **Automatic Monitoring** - Background thread checks memory usage - Triggers cleanup when thresholds exceeded - Logs statistics every 5 minutes 2. **GPU Memory Management** - Clears CUDA cache after each operation - Reloads Whisper model if fragmentation detected - Tracks reload count and timing 3. **Temporary File Cleanup** - Tracks all temporary files - Age-based cleanup (5 minutes normal, 1 minute aggressive) - Cleanup on process exit 4. **Context Managers** ```python with AudioProcessingContext(memory_manager) as ctx: # Process audio ctx.add_temp_file(temp_path) # Files automatically cleaned up ``` ### Admin Endpoints - `GET /admin/memory` - View current memory statistics - `POST /admin/memory/cleanup` - Trigger manual cleanup ## Frontend Memory Management ### MemoryManager Class Centralized tracking of all browser resources: ```typescript const memoryManager = MemoryManager.getInstance(); // Register resources memoryManager.registerAudioContext(context); memoryManager.registerObjectURL(url); memoryManager.registerMediaStream(stream); ``` ### SafeMediaRecorder Wrapper for MediaRecorder with automatic cleanup: ```typescript const recorder = new SafeMediaRecorder(); await recorder.start(constraints); // Recording... const blob = await recorder.stop(); // Automatically cleans up ``` ### AudioBlobHandler Safe handling of audio blobs and object URLs: ```typescript const handler = new AudioBlobHandler(blob); const url = handler.getObjectURL(); // Tracked automatically // Use URL... handler.cleanup(); // Revokes URL and clears references ``` ## Memory Thresholds ### Backend Thresholds | Resource | Default Limit | Configurable Via | |----------|--------------|------------------| | Process Memory | 4096 MB | MEMORY_THRESHOLD_MB | | GPU Memory | 2048 MB | GPU_MEMORY_THRESHOLD_MB | | Temp File Age | 300 seconds | Built-in | | Model Reload Interval | 300 seconds | Built-in | ### Frontend Thresholds | Resource | Cleanup Trigger | |----------|----------------| | Closed AudioContexts | Every 30 seconds | | Stopped MediaStreams | Every 30 seconds | | Orphaned Object URLs | On navigation/unload | ## Best Practices ### Backend 1. **Use Context Managers** ```python @with_memory_management def process_audio(): # Automatic cleanup ``` 2. **Register Temporary Files** ```python register_temp_file(path) ctx.add_temp_file(path) ``` 3. **Clear GPU Memory** ```python torch.cuda.empty_cache() torch.cuda.synchronize() ``` ### Frontend 1. **Use Safe Wrappers** ```typescript // Don't use raw MediaRecorder const recorder = new SafeMediaRecorder(); ``` 2. **Clean Up Handlers** ```typescript if (audioHandler) { audioHandler.cleanup(); } ``` 3. **Register All Resources** ```typescript const context = new AudioContext(); memoryManager.registerAudioContext(context); ``` ## Monitoring ### Backend Monitoring ```bash # View memory stats curl -H "X-Admin-Token: token" http://localhost:5005/admin/memory # Response { "memory": { "process_mb": 850.5, "system_percent": 45.2, "gpu_mb": 1250.0, "gpu_percent": 61.0 }, "temp_files": { "count": 5, "size_mb": 12.5 }, "model": { "reload_count": 2, "last_reload": "2024-01-15T10:30:00" } } ``` ### Frontend Monitoring ```javascript // Get memory stats const stats = memoryManager.getStats(); console.log('Active contexts:', stats.audioContexts); console.log('Object URLs:', stats.objectURLs); ``` ## Troubleshooting ### High Memory Usage 1. **Check Current Usage** ```bash curl -H "X-Admin-Token: token" http://localhost:5005/admin/memory ``` 2. **Trigger Manual Cleanup** ```bash curl -X POST -H "X-Admin-Token: token" \ http://localhost:5005/admin/memory/cleanup ``` 3. **Check Logs** ```bash grep "Memory" logs/talk2me.log grep "GPU memory" logs/talk2me.log ``` ### Memory Leak Symptoms 1. **Backend** - Process memory continuously increasing - GPU memory not returning to baseline - Temp files accumulating in upload folder - Slower transcription over time 2. **Frontend** - Browser tab memory increasing - Page becoming unresponsive - Audio playback issues - Console errors about contexts ### Debug Mode Enable debug logging: ```python # Backend app.config['DEBUG_MEMORY'] = True # Frontend (in console) localStorage.setItem('DEBUG_MEMORY', 'true'); ``` ## Performance Impact Memory management adds minimal overhead: - Backend: ~30ms per cleanup cycle - Frontend: <5ms per resource registration - Cleanup operations are non-blocking - Model reloading takes ~2-3 seconds (rare) ## Future Enhancements 1. **Predictive Cleanup**: Clean resources based on usage patterns 2. **Memory Pooling**: Reuse audio buffers and contexts 3. **Distributed Memory**: Share memory stats across instances 4. **Alert System**: Notify admins of memory issues 5. **Auto-scaling**: Scale resources based on memory pressure