talk2me/MEMORY_MANAGEMENT.md

# Memory Management Documentation

This document describes the comprehensive memory management system implemented in Talk2Me to prevent memory leaks and crashes after extended use.

## Overview

Talk2Me implements a dual-layer memory management system:
1. **Backend (Python)**: Manages GPU memory, Whisper model, and temporary files
2. **Frontend (JavaScript)**: Manages audio blobs, object URLs, and Web Audio contexts

## Memory Leak Issues Addressed

### Backend Memory Leaks

1. **GPU Memory Fragmentation**
   - Whisper model accumulates GPU memory over time
   - Solution: Periodic GPU cache clearing and model reloading

2. **Temporary File Accumulation**
   - Audio files not cleaned up quickly enough under load
   - Solution: Aggressive cleanup with tracking and periodic sweeps

3. **Session Resource Leaks**
   - Long-lived sessions accumulate resources
   - Solution: Integration with session manager for resource limits

### Frontend Memory Leaks

1. **Audio Blob Leaks**
   - MediaRecorder chunks kept in memory
   - Solution: SafeMediaRecorder wrapper with automatic cleanup

2. **Object URL Leaks**
   - URLs created but not revoked
   - Solution: Centralized tracking and automatic revocation

3. **AudioContext Leaks**
   - Contexts created but never closed
   - Solution: MemoryManager tracks and closes contexts

4. **MediaStream Leaks**
   - Microphone streams not properly stopped
   - Solution: Automatic track stopping and stream cleanup

## Backend Memory Management

### MemoryManager Class

The `MemoryManager` monitors and manages memory usage:

```python
memory_manager = MemoryManager(app, {
    'memory_threshold_mb': 4096,      # 4GB process memory limit
    'gpu_memory_threshold_mb': 2048,  # 2GB GPU memory limit
    'cleanup_interval': 30            # Check every 30 seconds
})
```

### Features

1. **Automatic Monitoring**
   - Background thread checks memory usage
   - Triggers cleanup when thresholds exceeded
   - Logs statistics every 5 minutes

2. **GPU Memory Management**
   - Clears CUDA cache after each operation
   - Reloads Whisper model if fragmentation detected
   - Tracks reload count and timing

3. **Temporary File Cleanup**
   - Tracks all temporary files
   - Age-based cleanup (5 minutes normal, 1 minute aggressive)
   - Cleanup on process exit

4. **Context Managers**
   ```python
   with AudioProcessingContext(memory_manager) as ctx:
       # Process audio
       ctx.add_temp_file(temp_path)
       # Files automatically cleaned up
   ```

### Admin Endpoints

- `GET /admin/memory` - View current memory statistics
- `POST /admin/memory/cleanup` - Trigger manual cleanup

## Frontend Memory Management

### MemoryManager Class

Centralized tracking of all browser resources:

```typescript
const memoryManager = MemoryManager.getInstance();

// Register resources
memoryManager.registerAudioContext(context);
memoryManager.registerObjectURL(url);
memoryManager.registerMediaStream(stream);
```

### SafeMediaRecorder

Wrapper for MediaRecorder with automatic cleanup:

```typescript
const recorder = new SafeMediaRecorder();
await recorder.start(constraints);
// Recording...
const blob = await recorder.stop(); // Automatically cleans up
```

### AudioBlobHandler

Safe handling of audio blobs and object URLs:

```typescript
const handler = new AudioBlobHandler(blob);
const url = handler.getObjectURL(); // Tracked automatically
// Use URL...
handler.cleanup(); // Revokes URL and clears references
```

## Memory Thresholds

### Backend Thresholds

| Resource | Default Limit | Configurable Via |
|----------|--------------|------------------|
| Process Memory | 4096 MB | MEMORY_THRESHOLD_MB |
| GPU Memory | 2048 MB | GPU_MEMORY_THRESHOLD_MB |
| Temp File Age | 300 seconds | Built-in |
| Model Reload Interval | 300 seconds | Built-in |

### Frontend Thresholds

| Resource | Cleanup Trigger |
|----------|----------------|
| Closed AudioContexts | Every 30 seconds |
| Stopped MediaStreams | Every 30 seconds |
| Orphaned Object URLs | On navigation/unload |

## Best Practices

### Backend

1. **Use Context Managers**
   ```python
   @with_memory_management
   def process_audio():
       # Automatic cleanup
   ```

2. **Register Temporary Files**
   ```python
   register_temp_file(path)
   ctx.add_temp_file(path)
   ```

3. **Clear GPU Memory**
   ```python
   torch.cuda.empty_cache()
   torch.cuda.synchronize()
   ```

### Frontend

1. **Use Safe Wrappers**
   ```typescript
   // Don't use raw MediaRecorder
   const recorder = new SafeMediaRecorder();
   ```

2. **Clean Up Handlers**
   ```typescript
   if (audioHandler) {
       audioHandler.cleanup();
   }
   ```

3. **Register All Resources**
   ```typescript
   const context = new AudioContext();
   memoryManager.registerAudioContext(context);
   ```

## Monitoring

### Backend Monitoring

```bash
# View memory stats
curl -H "X-Admin-Token: token" http://localhost:5005/admin/memory

# Response
{
  "memory": {
    "process_mb": 850.5,
    "system_percent": 45.2,
    "gpu_mb": 1250.0,
    "gpu_percent": 61.0
  },
  "temp_files": {
    "count": 5,
    "size_mb": 12.5
  },
  "model": {
    "reload_count": 2,
    "last_reload": "2024-01-15T10:30:00"
  }
}
```

### Frontend Monitoring

```javascript
// Get memory stats
const stats = memoryManager.getStats();
console.log('Active contexts:', stats.audioContexts);
console.log('Object URLs:', stats.objectURLs);
```

## Troubleshooting

### High Memory Usage

1. **Check Current Usage**
   ```bash
   curl -H "X-Admin-Token: token" http://localhost:5005/admin/memory
   ```

2. **Trigger Manual Cleanup**
   ```bash
   curl -X POST -H "X-Admin-Token: token" \
     http://localhost:5005/admin/memory/cleanup
   ```

3. **Check Logs**
   ```bash
   grep "Memory" logs/talk2me.log
   grep "GPU memory" logs/talk2me.log
   ```

### Memory Leak Symptoms

1. **Backend**
   - Process memory continuously increasing
   - GPU memory not returning to baseline
   - Temp files accumulating in upload folder
   - Slower transcription over time

2. **Frontend**
   - Browser tab memory increasing
   - Page becoming unresponsive
   - Audio playback issues
   - Console errors about contexts

### Debug Mode

Enable debug logging:
```python
# Backend
app.config['DEBUG_MEMORY'] = True

# Frontend (in console)
localStorage.setItem('DEBUG_MEMORY', 'true');
```

## Performance Impact

Memory management adds minimal overhead:
- Backend: ~30ms per cleanup cycle
- Frontend: <5ms per resource registration
- Cleanup operations are non-blocking
- Model reloading takes ~2-3 seconds (rare)

## Future Enhancements

1. **Predictive Cleanup**: Clean resources based on usage patterns
2. **Memory Pooling**: Reuse audio buffers and contexts
3. **Distributed Memory**: Share memory stats across instances
4. **Alert System**: Notify admins of memory issues
5. **Auto-scaling**: Scale resources based on memory pressure