Go to file
Adolfo Delorenzo d818ec7d73 Major PWA and mobile UI improvements
- Fixed PWA installation on Android by correcting manifest.json icon configuration
- Made UI mobile-friendly with compact layout and sticky record button
- Implemented auto-translation after transcription stops
- Updated branding from 'Voice Translator' to 'Talk2Me' throughout
- Added reverse proxy support with ProxyFix middleware
- Created diagnostic tools for PWA troubleshooting
- Added proper HTTP headers for service worker and manifest
- Improved mobile CSS with responsive design
- Fixed JavaScript bundling with webpack configuration
- Updated service worker cache versioning
- Added comprehensive PWA documentation

These changes ensure the app works properly as a PWA on Android devices
and provides a better mobile user experience.
2025-06-03 12:28:09 -06:00
static Major PWA and mobile UI improvements 2025-06-03 12:28:09 -06:00
templates Major PWA and mobile UI improvements 2025-06-03 12:28:09 -06:00
venv quasi-final 2025-04-05 11:50:31 -06:00
.dockerignore Add production WSGI server - Flask dev server unsuitable for production load 2025-06-03 08:49:32 -06:00
.env.example Remove hardcoded API key - CRITICAL SECURITY FIX 2025-06-03 00:06:18 -06:00
.env.template Major PWA and mobile UI improvements 2025-06-03 12:28:09 -06:00
.gitignore Housekeeping: Remove unnecessary test and temporary files 2025-06-03 09:24:44 -06:00
app.py Major PWA and mobile UI improvements 2025-06-03 12:28:09 -06:00
check-pwa-status.html Major PWA and mobile UI improvements 2025-06-03 12:28:09 -06:00
config.py Add request size limits - Prevents memory exhaustion from large uploads 2025-06-03 00:58:14 -06:00
deploy.sh Add production WSGI server - Flask dev server unsuitable for production load 2025-06-03 08:49:32 -06:00
diagnose-pwa.py Major PWA and mobile UI improvements 2025-06-03 12:28:09 -06:00
docker-compose.amd.yml Add multi-GPU support for Docker deployments 2025-06-03 09:16:41 -06:00
docker-compose.apple.yml Add multi-GPU support for Docker deployments 2025-06-03 09:16:41 -06:00
docker-compose.nvidia.yml Add multi-GPU support for Docker deployments 2025-06-03 09:16:41 -06:00
docker-compose.yml Add production WSGI server - Flask dev server unsuitable for production load 2025-06-03 08:49:32 -06:00
Dockerfile Add production WSGI server - Flask dev server unsuitable for production load 2025-06-03 08:49:32 -06:00
error_logger.py Implement proper error logging - Critical for debugging production issues 2025-06-03 08:11:26 -06:00
gunicorn_config.py Add production WSGI server - Flask dev server unsuitable for production load 2025-06-03 08:49:32 -06:00
health-monitor.py Add health check endpoints and automatic language detection 2025-06-02 22:37:38 -06:00
maintenance.sh Fix temporary file accumulation to prevent disk space exhaustion 2025-06-02 23:27:59 -06:00
manage_secrets.py Add comprehensive secrets management system for secure configuration 2025-06-03 00:24:03 -06:00
memory_manager.py Add production WSGI server - Flask dev server unsuitable for production load 2025-06-03 08:49:32 -06:00
nginx.conf Add production WSGI server - Flask dev server unsuitable for production load 2025-06-03 08:49:32 -06:00
package-lock.json Major PWA and mobile UI improvements 2025-06-03 12:28:09 -06:00
package.json Major PWA and mobile UI improvements 2025-06-03 12:28:09 -06:00
rate_limiter.py Implement comprehensive rate limiting to protect against DoS attacks 2025-06-03 00:14:05 -06:00
README.md Add multi-GPU support for Docker deployments 2025-06-03 09:16:41 -06:00
request_size_limiter.py Add request size limits - Prevents memory exhaustion from large uploads 2025-06-03 00:58:14 -06:00
requirements-prod.txt Add production WSGI server - Flask dev server unsuitable for production load 2025-06-03 08:49:32 -06:00
requirements.txt Fix potential memory leaks in audio handling - Can crash server after extended use 2025-06-03 08:37:13 -06:00
REVERSE_PROXY.md Major PWA and mobile UI improvements 2025-06-03 12:28:09 -06:00
secrets_manager.py Add comprehensive secrets management system for secure configuration 2025-06-03 00:24:03 -06:00
session_manager.py Implement session management - Prevents resource leaks from abandoned sessions 2025-06-03 00:47:46 -06:00
talk2me.service Add production WSGI server - Flask dev server unsuitable for production load 2025-06-03 08:49:32 -06:00
tsconfig.json Major improvements: TypeScript, animations, notifications, compression, GPU optimization 2025-06-02 21:18:16 -06:00
validate-pwa.html Major PWA and mobile UI improvements 2025-06-03 12:28:09 -06:00
validators.py Add comprehensive input validation and sanitization 2025-06-02 22:58:17 -06:00
webpack.config.js Major PWA and mobile UI improvements 2025-06-03 12:28:09 -06:00
whisper_config.py Major improvements: TypeScript, animations, notifications, compression, GPU optimization 2025-06-02 21:18:16 -06:00
wsgi.py Add production WSGI server - Flask dev server unsuitable for production load 2025-06-03 08:49:32 -06:00

Talk2Me - Real-Time Voice Language Translator

A production-ready, mobile-friendly web application that provides real-time translation of spoken language between multiple languages.

Features

  • Real-time Speech Recognition: Powered by OpenAI Whisper with GPU acceleration
  • Advanced Translation: Using Gemma 3 open-source LLM via Ollama
  • Natural Text-to-Speech: OpenAI Edge TTS for lifelike voice output
  • Progressive Web App: Full offline support with service workers
  • Multi-Speaker Support: Track and translate conversations with multiple participants
  • Enterprise Security: Comprehensive rate limiting, session management, and encrypted secrets
  • Production Ready: Docker support, load balancing, and extensive monitoring

Table of Contents

Supported Languages

  • Arabic
  • Armenian
  • Azerbaijani
  • English
  • French
  • Georgian
  • Kazakh
  • Mandarin
  • Farsi
  • Portuguese
  • Russian
  • Spanish
  • Turkish
  • Uzbek

Quick Start

# Clone the repository
git clone https://github.com/yourusername/talk2me.git
cd talk2me

# Install dependencies
pip install -r requirements.txt
npm install

# Initialize secure configuration
python manage_secrets.py init
python manage_secrets.py set TTS_API_KEY your-api-key-here

# Ensure Ollama is running with Gemma
ollama pull gemma2:9b
ollama pull gemma3:27b

# Start the application
python app.py

Open your browser and navigate to http://localhost:5005

Installation

Prerequisites

  • Python 3.8+
  • Node.js 14+
  • Ollama (for LLM translation)
  • OpenAI Edge TTS server
  • Optional: NVIDIA GPU with CUDA, AMD GPU with ROCm, or Apple Silicon

Detailed Setup

  1. Install Python dependencies:

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    pip install -r requirements.txt
    
  2. Install Node.js dependencies:

    npm install
    npm run build  # Build TypeScript files
    
  3. Configure GPU Support (Optional):

    # For NVIDIA GPUs
    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
    
    # For AMD GPUs (ROCm)
    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2
    
    # For Apple Silicon
    pip install torch torchvision torchaudio
    
  4. Set up Ollama:

    # Install Ollama (https://ollama.ai)
    curl -fsSL https://ollama.ai/install.sh | sh
    
    # Pull required models
    ollama pull gemma2:9b    # Faster, for streaming
    ollama pull gemma3:27b   # Better quality
    
  5. Configure TTS Server: Ensure your OpenAI Edge TTS server is running. Default expected at http://localhost:5050

Configuration

Environment Variables

Talk2Me uses encrypted secrets management for sensitive configuration. You can use either the secure secrets system or traditional environment variables.

# Initialize the secrets system
python manage_secrets.py init

# Set required secrets
python manage_secrets.py set TTS_API_KEY
python manage_secrets.py set TTS_SERVER_URL
python manage_secrets.py set ADMIN_TOKEN

# List all secrets
python manage_secrets.py list

# Rotate encryption keys
python manage_secrets.py rotate

Using Environment Variables

Create a .env file:

# Core Configuration
TTS_API_KEY=your-api-key-here
TTS_SERVER_URL=http://localhost:5050/v1/audio/speech
ADMIN_TOKEN=your-secure-admin-token

# CORS Configuration
CORS_ORIGINS=https://yourdomain.com,https://app.yourdomain.com
ADMIN_CORS_ORIGINS=https://admin.yourdomain.com

# Security Settings
SECRET_KEY=your-secret-key-here
MAX_CONTENT_LENGTH=52428800  # 50MB
SESSION_LIFETIME=3600  # 1 hour
RATE_LIMIT_STORAGE_URL=redis://localhost:6379/0

# Performance Tuning
WHISPER_MODEL_SIZE=base
GPU_MEMORY_THRESHOLD_MB=2048
MEMORY_CLEANUP_INTERVAL=30

Advanced Configuration

CORS Settings

# Development (allow all origins)
export CORS_ORIGINS="*"

# Production (restrict to specific domains)
export CORS_ORIGINS="https://yourdomain.com,https://app.yourdomain.com"
export ADMIN_CORS_ORIGINS="https://admin.yourdomain.com"

Rate Limiting

Configure per-endpoint rate limits:

# In your config or via admin API
RATE_LIMITS = {
    'default': {'requests_per_minute': 30, 'requests_per_hour': 500},
    'transcribe': {'requests_per_minute': 10, 'requests_per_hour': 100},
    'translate': {'requests_per_minute': 20, 'requests_per_hour': 300}
}

Session Management

SESSION_CONFIG = {
    'max_file_size_mb': 100,
    'max_files_per_session': 100,
    'idle_timeout_minutes': 15,
    'max_lifetime_minutes': 60
}

Security Features

1. Rate Limiting

Comprehensive DoS protection with:

  • Token bucket algorithm with sliding window
  • Per-endpoint configurable limits
  • Automatic IP blocking for abusive clients
  • Request size validation
# Check rate limit status
curl -H "X-Admin-Token: $ADMIN_TOKEN" http://localhost:5005/admin/rate-limits

# Block an IP
curl -X POST -H "X-Admin-Token: $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"ip": "192.168.1.100", "duration": 3600}' \
  http://localhost:5005/admin/block-ip

2. Secrets Management

  • AES-128 encryption for sensitive data
  • Automatic key rotation
  • Audit logging
  • Platform-specific secure storage
# View audit log
python manage_secrets.py audit

# Backup secrets
python manage_secrets.py export --output backup.enc

# Restore from backup
python manage_secrets.py import --input backup.enc

3. Session Management

  • Automatic resource tracking
  • Per-session limits (100 files, 100MB)
  • Idle session cleanup (15 minutes)
  • Real-time monitoring
# View active sessions
curl -H "X-Admin-Token: $ADMIN_TOKEN" http://localhost:5005/admin/sessions

# Clean up specific session
curl -X POST -H "X-Admin-Token: $ADMIN_TOKEN" \
  http://localhost:5005/admin/sessions/SESSION_ID/cleanup

4. Request Size Limits

  • Global limit: 50MB
  • Audio files: 25MB
  • JSON payloads: 1MB
  • Dynamic configuration
# Update size limits
curl -X POST -H "X-Admin-Token: $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"max_audio_size": "30MB"}' \
  http://localhost:5005/admin/size-limits

Production Deployment

Docker Deployment

# Build and run with Docker Compose (CPU only)
docker-compose up -d

# With NVIDIA GPU support
docker-compose -f docker-compose.yml -f docker-compose.nvidia.yml up -d

# With AMD GPU support (ROCm)
docker-compose -f docker-compose.yml -f docker-compose.amd.yml up -d

# With Apple Silicon support
docker-compose -f docker-compose.yml -f docker-compose.apple.yml up -d

# Scale web workers
docker-compose up -d --scale talk2me=4

# View logs
docker-compose logs -f talk2me

Docker Compose Configuration

Choose the appropriate configuration based on your GPU:

NVIDIA GPU Configuration

version: '3.8'
services:
  web:
    build: .
    ports:
      - "5005:5005"
    environment:
      - GUNICORN_WORKERS=4
      - GUNICORN_THREADS=2
    volumes:
      - ./logs:/app/logs
      - whisper-cache:/root/.cache/whisper
    deploy:
      resources:
        limits:
          memory: 4G
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

AMD GPU Configuration (ROCm)

version: '3.8'
services:
  web:
    build: .
    ports:
      - "5005:5005"
    environment:
      - GUNICORN_WORKERS=4
      - GUNICORN_THREADS=2
      - HSA_OVERRIDE_GFX_VERSION=10.3.0  # Adjust for your GPU
    volumes:
      - ./logs:/app/logs
      - whisper-cache:/root/.cache/whisper
      - /dev/kfd:/dev/kfd  # ROCm KFD interface
      - /dev/dri:/dev/dri  # Direct Rendering Interface
    devices:
      - /dev/kfd
      - /dev/dri
    group_add:
      - video
      - render
    deploy:
      resources:
        limits:
          memory: 4G

Apple Silicon Configuration

version: '3.8'
services:
  web:
    build: .
    platform: linux/arm64/v8  # For M1/M2 Macs
    ports:
      - "5005:5005"
    environment:
      - GUNICORN_WORKERS=4
      - GUNICORN_THREADS=2
      - PYTORCH_ENABLE_MPS_FALLBACK=1  # Enable MPS fallback
    volumes:
      - ./logs:/app/logs
      - whisper-cache:/root/.cache/whisper
    deploy:
      resources:
        limits:
          memory: 4G

CPU-Only Configuration

version: '3.8'
services:
  web:
    build: .
    ports:
      - "5005:5005"
    environment:
      - GUNICORN_WORKERS=4
      - GUNICORN_THREADS=2
      - OMP_NUM_THREADS=4  # OpenMP threads for CPU
    volumes:
      - ./logs:/app/logs
      - whisper-cache:/root/.cache/whisper
    deploy:
      resources:
        limits:
          memory: 4G
          cpus: '4.0'

Nginx Configuration

upstream talk2me {
    least_conn;
    server web1:5005 weight=1 max_fails=3 fail_timeout=30s;
    server web2:5005 weight=1 max_fails=3 fail_timeout=30s;
}

server {
    listen 443 ssl http2;
    server_name talk2me.yourdomain.com;
    
    ssl_certificate /etc/ssl/certs/talk2me.crt;
    ssl_certificate_key /etc/ssl/private/talk2me.key;
    
    client_max_body_size 50M;
    
    location / {
        proxy_pass http://talk2me;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $host;
        
        # WebSocket support
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
    
    # Cache static assets
    location /static/ {
        alias /app/static/;
        expires 30d;
        add_header Cache-Control "public, immutable";
    }
}

Systemd Service

[Unit]
Description=Talk2Me Translation Service
After=network.target

[Service]
Type=notify
User=talk2me
Group=talk2me
WorkingDirectory=/opt/talk2me
Environment="PATH=/opt/talk2me/venv/bin"
ExecStart=/opt/talk2me/venv/bin/gunicorn \
    --config gunicorn_config.py \
    --bind 0.0.0.0:5005 \
    app:app
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

API Documentation

Core Endpoints

Transcribe Audio

POST /transcribe
Content-Type: multipart/form-data

audio: (binary)
source_lang: auto|language_code

Translate Text

POST /translate
Content-Type: application/json

{
  "text": "Hello world",
  "source_lang": "English",
  "target_lang": "Spanish"
}

Streaming Translation

POST /translate/stream
Content-Type: application/json

{
  "text": "Long text to translate",
  "source_lang": "auto",
  "target_lang": "French"
}

Response: Server-Sent Events stream

Text-to-Speech

POST /speak
Content-Type: application/json

{
  "text": "Hola mundo",
  "language": "Spanish"
}

Admin Endpoints

All admin endpoints require X-Admin-Token header.

Health & Monitoring

  • GET /health - Basic health check
  • GET /health/detailed - Component status
  • GET /metrics - Prometheus metrics
  • GET /admin/memory - Memory usage stats

Session Management

  • GET /admin/sessions - List active sessions
  • GET /admin/sessions/:id - Session details
  • POST /admin/sessions/:id/cleanup - Manual cleanup

Security Controls

  • GET /admin/rate-limits - View rate limits
  • POST /admin/block-ip - Block IP address
  • GET /admin/logs/security - Security events

Development

TypeScript Development

# Install dependencies
npm install

# Development mode with auto-compilation
npm run dev

# Build for production
npm run build

# Type checking
npm run typecheck

Project Structure

talk2me/
├── app.py                 # Main Flask application
├── config.py             # Configuration management
├── requirements.txt      # Python dependencies
├── package.json         # Node.js dependencies
├── tsconfig.json        # TypeScript configuration
├── gunicorn_config.py   # Production server config
├── docker-compose.yml   # Container orchestration
├── static/
│   ├── js/
│   │   ├── src/        # TypeScript source files
│   │   └── dist/       # Compiled JavaScript
│   ├── css/            # Stylesheets
│   └── icons/          # PWA icons
├── templates/          # HTML templates
├── logs/              # Application logs
└── tests/             # Test suite

Key Components

  1. Connection Management (connectionManager.ts)

    • Automatic retry with exponential backoff
    • Request queuing during offline periods
    • Connection status monitoring
  2. Translation Cache (translationCache.ts)

    • IndexedDB for offline support
    • LRU eviction policy
    • Automatic cache size management
  3. Speaker Management (speakerManager.ts)

    • Multi-speaker conversation tracking
    • Speaker-specific audio handling
    • Conversation export functionality
  4. Error Handling (errorBoundary.ts)

    • Global error catching
    • Automatic error reporting
    • User-friendly error messages

Running Tests

# Python tests
pytest tests/ -v

# TypeScript tests
npm test

# Integration tests
python test_integration.py

Monitoring & Operations

Logging System

Talk2Me uses structured JSON logging with multiple streams:

logs/
├── talk2me.log      # General application log
├── errors.log       # Error-specific log
├── access.log       # HTTP access log
├── security.log     # Security events
└── performance.log  # Performance metrics

View logs:

# Recent errors
tail -f logs/errors.log | jq '.'

# Security events
grep "rate_limit_exceeded" logs/security.log | jq '.'

# Slow requests
jq 'select(.extra_fields.duration_ms > 1000)' logs/performance.log

Memory Management

Talk2Me includes comprehensive memory leak prevention:

  1. Backend Memory Management

    • GPU memory monitoring
    • Automatic model reloading
    • Temporary file cleanup
  2. Frontend Memory Management

    • Audio blob cleanup
    • WebRTC resource management
    • Event listener cleanup

Monitor memory:

# Check memory stats
curl -H "X-Admin-Token: $ADMIN_TOKEN" http://localhost:5005/admin/memory

# Trigger manual cleanup
curl -X POST -H "X-Admin-Token: $ADMIN_TOKEN" \
  http://localhost:5005/admin/memory/cleanup

Performance Tuning

GPU Optimization

# config.py or environment
GPU_OPTIMIZATIONS = {
    'enabled': True,
    'fp16': True,           # Half precision for 2x speedup
    'batch_size': 1,        # Adjust based on GPU memory
    'num_workers': 2,       # Parallel data loading
    'pin_memory': True      # Faster GPU transfer
}

Whisper Optimization

TRANSCRIBE_OPTIONS = {
    'beam_size': 1,         # Faster inference
    'best_of': 1,           # Disable multiple attempts
    'temperature': 0,       # Deterministic output
    'compression_ratio_threshold': 2.4,
    'logprob_threshold': -1.0,
    'no_speech_threshold': 0.6
}

Scaling Considerations

  1. Horizontal Scaling

    • Use Redis for shared rate limiting
    • Configure sticky sessions for WebSocket
    • Share audio files via object storage
  2. Vertical Scaling

    • Increase worker processes
    • Tune thread pool size
    • Allocate more GPU memory
  3. Caching Strategy

    • Cache translations in Redis
    • Use CDN for static assets
    • Enable HTTP caching headers

Troubleshooting

Common Issues

GPU Not Detected

# Check CUDA availability
python -c "import torch; print(torch.cuda.is_available())"

# Check GPU memory
nvidia-smi

# For AMD GPUs
rocm-smi

# For Apple Silicon
python -c "import torch; print(torch.backends.mps.is_available())"

High Memory Usage

# Check for memory leaks
curl -H "X-Admin-Token: $ADMIN_TOKEN" http://localhost:5005/health/storage

# Manual cleanup
curl -X POST -H "X-Admin-Token: $ADMIN_TOKEN" \
  http://localhost:5005/admin/cleanup

CORS Issues

# Test CORS configuration
curl -X OPTIONS http://localhost:5005/api/transcribe \
  -H "Origin: https://yourdomain.com" \
  -H "Access-Control-Request-Method: POST"

TTS Server Connection

# Check TTS server status
curl http://localhost:5005/check_tts_server

# Update TTS configuration
curl -X POST http://localhost:5005/update_tts_config \
  -H "Content-Type: application/json" \
  -d '{"server_url": "http://localhost:5050/v1/audio/speech", "api_key": "new-key"}'

Debug Mode

Enable debug logging:

export FLASK_ENV=development
export LOG_LEVEL=DEBUG
python app.py

Performance Profiling

# Enable performance logging
export ENABLE_PROFILING=true

# View slow requests
jq 'select(.duration_ms > 1000)' logs/performance.log

Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Development Setup

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Run tests (pytest && npm test)
  5. Commit your changes (git commit -m 'Add amazing feature')
  6. Push to the branch (git push origin feature/amazing-feature)
  7. Open a Pull Request

Code Style

  • Python: Follow PEP 8
  • TypeScript: Use ESLint configuration
  • Commit messages: Use conventional commits

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • OpenAI Whisper team for the amazing speech recognition model
  • Ollama team for making LLMs accessible
  • All contributors who have helped improve Talk2Me

Support