talk2me/README.md
Adolfo Delorenzo 92fd390866 Add production WSGI server - Flask dev server unsuitable for production load
This adds a complete production deployment setup using Gunicorn as the WSGI server, replacing Flask's development server.

Key components:
- Gunicorn configuration with optimized worker settings
- Support for sync, threaded, and async (gevent) workers
- Automatic worker recycling to prevent memory leaks
- Increased timeouts for audio processing
- Production-ready logging and monitoring

Deployment options:
1. Docker/Docker Compose for containerized deployment
2. Systemd service for traditional deployment
3. Nginx reverse proxy configuration
4. SSL/TLS support

Production features:
- wsgi.py entry point for WSGI servers
- gunicorn_config.py with production settings
- Dockerfile with multi-stage build
- docker-compose.yml with full stack (Redis, PostgreSQL)
- nginx.conf with caching and security headers
- systemd service with security hardening
- deploy.sh automated deployment script

Configuration:
- .env.production template with all settings
- Support for environment-based configuration
- Separate requirements-prod.txt
- Prometheus metrics endpoint (/metrics)

Monitoring:
- Health check endpoints for liveness/readiness
- Prometheus-compatible metrics
- Structured logging
- Memory usage tracking
- Request counting

Security:
- Non-root user in Docker
- Systemd security restrictions
- Nginx security headers
- File permission hardening
- Resource limits

Documentation:
- Comprehensive PRODUCTION_DEPLOYMENT.md
- Scaling strategies
- Performance tuning guide
- Troubleshooting section

Also fixed memory_manager.py GC stats collection error.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-03 08:49:32 -06:00

5.1 KiB

Voice Language Translator

A mobile-friendly web application that translates spoken language between multiple languages using:

  • Gemma 3 open-source LLM via Ollama for translation
  • OpenAI Whisper for speech-to-text
  • OpenAI Edge TTS for text-to-speech

Supported Languages

  • Arabic
  • Armenian
  • Azerbaijani
  • English
  • French
  • Georgian
  • Kazakh
  • Mandarin
  • Farsi
  • Portuguese
  • Russian
  • Spanish
  • Turkish
  • Uzbek

Setup Instructions

  1. Install the required Python packages:

    pip install -r requirements.txt
    
  2. Configure secrets and environment:

    # Initialize secure secrets management
    python manage_secrets.py init
    
    # Set required secrets
    python manage_secrets.py set TTS_API_KEY
    
    # Or use traditional .env file
    cp .env.example .env
    nano .env
    

    ⚠️ Security Note: Talk2Me includes encrypted secrets management. See SECURITY.md and SECRETS_MANAGEMENT.md for details.

  3. Make sure you have Ollama installed and the Gemma 3 model loaded:

    ollama pull gemma3
    
  4. Ensure your OpenAI Edge TTS server is running on port 5050.

  5. Run the application:

    python app.py
    
  6. Open your browser and navigate to:

    http://localhost:8000
    

Usage

  1. Select your source language from the dropdown menu
  2. Press the microphone button and speak
  3. Press the button again to stop recording
  4. Wait for the transcription to complete
  5. Select your target language
  6. Press the "Translate" button
  7. Use the play buttons to hear the original or translated text

Technical Details

  • The app uses Flask for the web server
  • Audio is processed client-side using the MediaRecorder API
  • Whisper for speech recognition with language hints
  • Ollama provides access to the Gemma 3 model for translation
  • OpenAI Edge TTS delivers natural-sounding speech output

CORS Configuration

The application supports Cross-Origin Resource Sharing (CORS) for secure cross-origin usage. See CORS_CONFIG.md for detailed configuration instructions.

Quick setup:

# Development (allow all origins)
export CORS_ORIGINS="*"

# Production (restrict to specific domains)
export CORS_ORIGINS="https://yourdomain.com,https://app.yourdomain.com"
export ADMIN_CORS_ORIGINS="https://admin.yourdomain.com"

Connection Retry & Offline Support

Talk2Me handles network interruptions gracefully with automatic retry logic:

  • Automatic request queuing during connection loss
  • Exponential backoff retry with configurable parameters
  • Visual connection status indicators
  • Priority-based request processing

See CONNECTION_RETRY.md for detailed documentation.

Rate Limiting

Comprehensive rate limiting protects against DoS attacks and resource exhaustion:

  • Token bucket algorithm with sliding window
  • Per-endpoint configurable limits
  • Automatic IP blocking for abusive clients
  • Global request limits and concurrent request throttling
  • Request size validation

See RATE_LIMITING.md for detailed documentation.

Session Management

Advanced session management prevents resource leaks from abandoned sessions:

  • Automatic tracking of all session resources (audio files, temp files)
  • Per-session resource limits (100 files, 100MB)
  • Automatic cleanup of idle sessions (15 minutes) and expired sessions (1 hour)
  • Real-time monitoring and metrics
  • Manual cleanup capabilities for administrators

See SESSION_MANAGEMENT.md for detailed documentation.

Request Size Limits

Comprehensive request size limiting prevents memory exhaustion:

  • Global limit: 50MB for any request
  • Audio files: 25MB maximum
  • JSON payloads: 1MB maximum
  • File type detection and enforcement
  • Dynamic configuration via admin API

See REQUEST_SIZE_LIMITS.md for detailed documentation.

Error Logging

Production-ready error logging system for debugging and monitoring:

  • Structured JSON logs for easy parsing
  • Multiple log streams (app, errors, access, security, performance)
  • Automatic log rotation to prevent disk exhaustion
  • Request tracing with unique IDs
  • Performance metrics and slow request tracking
  • Admin endpoints for log analysis

See ERROR_LOGGING.md for detailed documentation.

Memory Management

Comprehensive memory leak prevention for extended use:

  • GPU memory management with automatic cleanup
  • Whisper model reloading to prevent fragmentation
  • Frontend resource tracking (audio blobs, contexts, streams)
  • Automatic cleanup of temporary files
  • Memory monitoring and manual cleanup endpoints

See MEMORY_MANAGEMENT.md for detailed documentation.

Production Deployment

For production use, deploy with a proper WSGI server:

  • Gunicorn with optimized worker configuration
  • Nginx reverse proxy with caching
  • Docker/Docker Compose support
  • Systemd service management
  • Comprehensive security hardening

Quick start:

docker-compose up -d

See PRODUCTION_DEPLOYMENT.md for detailed deployment instructions.

Mobile Support

The interface is fully responsive and designed to work well on mobile devices.