This adds a complete production deployment setup using Gunicorn as the WSGI server, replacing Flask's development server. Key components: - Gunicorn configuration with optimized worker settings - Support for sync, threaded, and async (gevent) workers - Automatic worker recycling to prevent memory leaks - Increased timeouts for audio processing - Production-ready logging and monitoring Deployment options: 1. Docker/Docker Compose for containerized deployment 2. Systemd service for traditional deployment 3. Nginx reverse proxy configuration 4. SSL/TLS support Production features: - wsgi.py entry point for WSGI servers - gunicorn_config.py with production settings - Dockerfile with multi-stage build - docker-compose.yml with full stack (Redis, PostgreSQL) - nginx.conf with caching and security headers - systemd service with security hardening - deploy.sh automated deployment script Configuration: - .env.production template with all settings - Support for environment-based configuration - Separate requirements-prod.txt - Prometheus metrics endpoint (/metrics) Monitoring: - Health check endpoints for liveness/readiness - Prometheus-compatible metrics - Structured logging - Memory usage tracking - Request counting Security: - Non-root user in Docker - Systemd security restrictions - Nginx security headers - File permission hardening - Resource limits Documentation: - Comprehensive PRODUCTION_DEPLOYMENT.md - Scaling strategies - Performance tuning guide - Troubleshooting section Also fixed memory_manager.py GC stats collection error. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> |
||
---|---|---|
static | ||
templates | ||
venv | ||
.dockerignore | ||
.env.example | ||
.gitignore | ||
app.py | ||
config.py | ||
CONNECTION_RETRY.md | ||
CORS_CONFIG.md | ||
deploy.sh | ||
docker-compose.yml | ||
Dockerfile | ||
error_logger.py | ||
ERROR_LOGGING.md | ||
GPU_SUPPORT.md | ||
gunicorn_config.py | ||
health-monitor.py | ||
maintenance.sh | ||
manage_secrets.py | ||
MEMORY_MANAGEMENT.md | ||
memory_manager.py | ||
nginx.conf | ||
package-lock.json | ||
package.json | ||
PRODUCTION_DEPLOYMENT.md | ||
rate_limiter.py | ||
RATE_LIMITING.md | ||
README_TYPESCRIPT.md | ||
README.md | ||
request_size_limiter.py | ||
REQUEST_SIZE_LIMITS.md | ||
requirements-prod.txt | ||
requirements.txt | ||
SECRETS_MANAGEMENT.md | ||
secrets_manager.py | ||
SECURITY.md | ||
SESSION_MANAGEMENT.md | ||
session_manager.py | ||
setup-script.sh | ||
talk2me.service | ||
test_error_logging.py | ||
test_session_manager.py | ||
test_size_limits.py | ||
test-cors.html | ||
tsconfig.json | ||
tts_test_output.mp3 | ||
tts-debug-script.py | ||
validators.py | ||
whisper_config.py | ||
wsgi.py |
Voice Language Translator
A mobile-friendly web application that translates spoken language between multiple languages using:
- Gemma 3 open-source LLM via Ollama for translation
- OpenAI Whisper for speech-to-text
- OpenAI Edge TTS for text-to-speech
Supported Languages
- Arabic
- Armenian
- Azerbaijani
- English
- French
- Georgian
- Kazakh
- Mandarin
- Farsi
- Portuguese
- Russian
- Spanish
- Turkish
- Uzbek
Setup Instructions
-
Install the required Python packages:
pip install -r requirements.txt
-
Configure secrets and environment:
# Initialize secure secrets management python manage_secrets.py init # Set required secrets python manage_secrets.py set TTS_API_KEY # Or use traditional .env file cp .env.example .env nano .env
⚠️ Security Note: Talk2Me includes encrypted secrets management. See SECURITY.md and SECRETS_MANAGEMENT.md for details.
-
Make sure you have Ollama installed and the Gemma 3 model loaded:
ollama pull gemma3
-
Ensure your OpenAI Edge TTS server is running on port 5050.
-
Run the application:
python app.py
-
Open your browser and navigate to:
http://localhost:8000
Usage
- Select your source language from the dropdown menu
- Press the microphone button and speak
- Press the button again to stop recording
- Wait for the transcription to complete
- Select your target language
- Press the "Translate" button
- Use the play buttons to hear the original or translated text
Technical Details
- The app uses Flask for the web server
- Audio is processed client-side using the MediaRecorder API
- Whisper for speech recognition with language hints
- Ollama provides access to the Gemma 3 model for translation
- OpenAI Edge TTS delivers natural-sounding speech output
CORS Configuration
The application supports Cross-Origin Resource Sharing (CORS) for secure cross-origin usage. See CORS_CONFIG.md for detailed configuration instructions.
Quick setup:
# Development (allow all origins)
export CORS_ORIGINS="*"
# Production (restrict to specific domains)
export CORS_ORIGINS="https://yourdomain.com,https://app.yourdomain.com"
export ADMIN_CORS_ORIGINS="https://admin.yourdomain.com"
Connection Retry & Offline Support
Talk2Me handles network interruptions gracefully with automatic retry logic:
- Automatic request queuing during connection loss
- Exponential backoff retry with configurable parameters
- Visual connection status indicators
- Priority-based request processing
See CONNECTION_RETRY.md for detailed documentation.
Rate Limiting
Comprehensive rate limiting protects against DoS attacks and resource exhaustion:
- Token bucket algorithm with sliding window
- Per-endpoint configurable limits
- Automatic IP blocking for abusive clients
- Global request limits and concurrent request throttling
- Request size validation
See RATE_LIMITING.md for detailed documentation.
Session Management
Advanced session management prevents resource leaks from abandoned sessions:
- Automatic tracking of all session resources (audio files, temp files)
- Per-session resource limits (100 files, 100MB)
- Automatic cleanup of idle sessions (15 minutes) and expired sessions (1 hour)
- Real-time monitoring and metrics
- Manual cleanup capabilities for administrators
See SESSION_MANAGEMENT.md for detailed documentation.
Request Size Limits
Comprehensive request size limiting prevents memory exhaustion:
- Global limit: 50MB for any request
- Audio files: 25MB maximum
- JSON payloads: 1MB maximum
- File type detection and enforcement
- Dynamic configuration via admin API
See REQUEST_SIZE_LIMITS.md for detailed documentation.
Error Logging
Production-ready error logging system for debugging and monitoring:
- Structured JSON logs for easy parsing
- Multiple log streams (app, errors, access, security, performance)
- Automatic log rotation to prevent disk exhaustion
- Request tracing with unique IDs
- Performance metrics and slow request tracking
- Admin endpoints for log analysis
See ERROR_LOGGING.md for detailed documentation.
Memory Management
Comprehensive memory leak prevention for extended use:
- GPU memory management with automatic cleanup
- Whisper model reloading to prevent fragmentation
- Frontend resource tracking (audio blobs, contexts, streams)
- Automatic cleanup of temporary files
- Memory monitoring and manual cleanup endpoints
See MEMORY_MANAGEMENT.md for detailed documentation.
Production Deployment
For production use, deploy with a proper WSGI server:
- Gunicorn with optimized worker configuration
- Nginx reverse proxy with caching
- Docker/Docker Compose support
- Systemd service management
- Comprehensive security hardening
Quick start:
docker-compose up -d
See PRODUCTION_DEPLOYMENT.md for detailed deployment instructions.
Mobile Support
The interface is fully responsive and designed to work well on mobile devices.