talk2me

adelorenzo/talk2me

Fork 0

Commit Graph

Author	SHA1	Message	Date
Adolfo Delorenzo	92fd390866	Add production WSGI server - Flask dev server unsuitable for production load This adds a complete production deployment setup using Gunicorn as the WSGI server, replacing Flask's development server. Key components: - Gunicorn configuration with optimized worker settings - Support for sync, threaded, and async (gevent) workers - Automatic worker recycling to prevent memory leaks - Increased timeouts for audio processing - Production-ready logging and monitoring Deployment options: 1. Docker/Docker Compose for containerized deployment 2. Systemd service for traditional deployment 3. Nginx reverse proxy configuration 4. SSL/TLS support Production features: - wsgi.py entry point for WSGI servers - gunicorn_config.py with production settings - Dockerfile with multi-stage build - docker-compose.yml with full stack (Redis, PostgreSQL) - nginx.conf with caching and security headers - systemd service with security hardening - deploy.sh automated deployment script Configuration: - .env.production template with all settings - Support for environment-based configuration - Separate requirements-prod.txt - Prometheus metrics endpoint (/metrics) Monitoring: - Health check endpoints for liveness/readiness - Prometheus-compatible metrics - Structured logging - Memory usage tracking - Request counting Security: - Non-root user in Docker - Systemd security restrictions - Nginx security headers - File permission hardening - Resource limits Documentation: - Comprehensive PRODUCTION_DEPLOYMENT.md - Scaling strategies - Performance tuning guide - Troubleshooting section Also fixed memory_manager.py GC stats collection error. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-06-03 08:49:32 -06:00
Adolfo Delorenzo	1b9ad03400	Fix potential memory leaks in audio handling - Can crash server after extended use This comprehensive fix addresses memory leaks in both backend and frontend that could cause server crashes after extended use. Backend fixes: - MemoryManager class monitors process and GPU memory usage - Automatic cleanup when thresholds exceeded (4GB process, 2GB GPU) - Whisper model reloading to clear GPU memory fragmentation - Aggressive temporary file cleanup based on age - Context manager for audio processing with guaranteed cleanup - Integration with session manager for resource tracking - Background monitoring thread runs every 30 seconds Frontend fixes: - MemoryManager singleton tracks all browser resources - SafeMediaRecorder wrapper ensures stream cleanup - AudioBlobHandler manages blob lifecycle and object URLs - Automatic cleanup of closed AudioContexts - Proper MediaStream track stopping - Periodic cleanup of orphaned resources - Cleanup on page unload Admin features: - GET /admin/memory - View memory statistics - POST /admin/memory/cleanup - Trigger manual cleanup - Real-time metrics including GPU usage and temp files - Model reload tracking Key improvements: - AudioContext properly closed after use - Object URLs revoked after use - MediaRecorder streams properly stopped - Audio chunks cleared after processing - GPU cache cleared after each transcription - Temp files tracked and cleaned aggressively This prevents the gradual memory increase that could lead to out-of-memory errors or performance degradation after hours of use. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-06-03 08:37:13 -06:00

Author

SHA1

Message

Date

Adolfo Delorenzo

92fd390866

Add production WSGI server - Flask dev server unsuitable for production load

This adds a complete production deployment setup using Gunicorn as the WSGI server, replacing Flask's development server.

Key components:
- Gunicorn configuration with optimized worker settings
- Support for sync, threaded, and async (gevent) workers
- Automatic worker recycling to prevent memory leaks
- Increased timeouts for audio processing
- Production-ready logging and monitoring

Deployment options:
1. Docker/Docker Compose for containerized deployment
2. Systemd service for traditional deployment
3. Nginx reverse proxy configuration
4. SSL/TLS support

Production features:
- wsgi.py entry point for WSGI servers
- gunicorn_config.py with production settings
- Dockerfile with multi-stage build
- docker-compose.yml with full stack (Redis, PostgreSQL)
- nginx.conf with caching and security headers
- systemd service with security hardening
- deploy.sh automated deployment script

Configuration:
- .env.production template with all settings
- Support for environment-based configuration
- Separate requirements-prod.txt
- Prometheus metrics endpoint (/metrics)

Monitoring:
- Health check endpoints for liveness/readiness
- Prometheus-compatible metrics
- Structured logging
- Memory usage tracking
- Request counting

Security:
- Non-root user in Docker
- Systemd security restrictions
- Nginx security headers
- File permission hardening
- Resource limits

Documentation:
- Comprehensive PRODUCTION_DEPLOYMENT.md
- Scaling strategies
- Performance tuning guide
- Troubleshooting section

Also fixed memory_manager.py GC stats collection error.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-06-03 08:49:32 -06:00

Adolfo Delorenzo

1b9ad03400

Fix potential memory leaks in audio handling - Can crash server after extended use

This comprehensive fix addresses memory leaks in both backend and frontend that could cause server crashes after extended use.

Backend fixes:
- MemoryManager class monitors process and GPU memory usage
- Automatic cleanup when thresholds exceeded (4GB process, 2GB GPU)
- Whisper model reloading to clear GPU memory fragmentation
- Aggressive temporary file cleanup based on age
- Context manager for audio processing with guaranteed cleanup
- Integration with session manager for resource tracking
- Background monitoring thread runs every 30 seconds

Frontend fixes:
- MemoryManager singleton tracks all browser resources
- SafeMediaRecorder wrapper ensures stream cleanup
- AudioBlobHandler manages blob lifecycle and object URLs
- Automatic cleanup of closed AudioContexts
- Proper MediaStream track stopping
- Periodic cleanup of orphaned resources
- Cleanup on page unload

Admin features:
- GET /admin/memory - View memory statistics
- POST /admin/memory/cleanup - Trigger manual cleanup
- Real-time metrics including GPU usage and temp files
- Model reload tracking

Key improvements:
- AudioContext properly closed after use
- Object URLs revoked after use
- MediaRecorder streams properly stopped
- Audio chunks cleared after processing
- GPU cache cleared after each transcription
- Temp files tracked and cleaned aggressively

This prevents the gradual memory increase that could lead to out-of-memory errors or performance degradation after hours of use.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-06-03 08:37:13 -06:00

2 Commits