talk2me/README.md
Adolfo Delorenzo e5333d8410 Consolidate all documentation into comprehensive README
- Merged 12 separate documentation files into single README.md
- Organized content with clear table of contents
- Maintained all technical details and examples
- Improved overall documentation structure and flow
- Removed redundant separate documentation files

The new README provides a complete guide covering:
- Installation and configuration
- Security features (rate limiting, secrets, sessions)
- Production deployment with Docker/Nginx
- API documentation
- Development guidelines
- Monitoring and troubleshooting

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-03 09:10:58 -06:00

16 KiB

Talk2Me - Real-Time Voice Language Translator

A production-ready, mobile-friendly web application that provides real-time translation of spoken language between multiple languages.

Features

  • Real-time Speech Recognition: Powered by OpenAI Whisper with GPU acceleration
  • Advanced Translation: Using Gemma 3 open-source LLM via Ollama
  • Natural Text-to-Speech: OpenAI Edge TTS for lifelike voice output
  • Progressive Web App: Full offline support with service workers
  • Multi-Speaker Support: Track and translate conversations with multiple participants
  • Enterprise Security: Comprehensive rate limiting, session management, and encrypted secrets
  • Production Ready: Docker support, load balancing, and extensive monitoring

Table of Contents

Supported Languages

  • Arabic
  • Armenian
  • Azerbaijani
  • English
  • French
  • Georgian
  • Kazakh
  • Mandarin
  • Farsi
  • Portuguese
  • Russian
  • Spanish
  • Turkish
  • Uzbek

Quick Start

# Clone the repository
git clone https://github.com/yourusername/talk2me.git
cd talk2me

# Install dependencies
pip install -r requirements.txt
npm install

# Initialize secure configuration
python manage_secrets.py init
python manage_secrets.py set TTS_API_KEY your-api-key-here

# Ensure Ollama is running with Gemma
ollama pull gemma2:9b
ollama pull gemma3:27b

# Start the application
python app.py

Open your browser and navigate to http://localhost:5005

Installation

Prerequisites

  • Python 3.8+
  • Node.js 14+
  • Ollama (for LLM translation)
  • OpenAI Edge TTS server
  • Optional: NVIDIA GPU with CUDA, AMD GPU with ROCm, or Apple Silicon

Detailed Setup

  1. Install Python dependencies:

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    pip install -r requirements.txt
    
  2. Install Node.js dependencies:

    npm install
    npm run build  # Build TypeScript files
    
  3. Configure GPU Support (Optional):

    # For NVIDIA GPUs
    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
    
    # For AMD GPUs (ROCm)
    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2
    
    # For Apple Silicon
    pip install torch torchvision torchaudio
    
  4. Set up Ollama:

    # Install Ollama (https://ollama.ai)
    curl -fsSL https://ollama.ai/install.sh | sh
    
    # Pull required models
    ollama pull gemma2:9b    # Faster, for streaming
    ollama pull gemma3:27b   # Better quality
    
  5. Configure TTS Server: Ensure your OpenAI Edge TTS server is running. Default expected at http://localhost:5050

Configuration

Environment Variables

Talk2Me uses encrypted secrets management for sensitive configuration. You can use either the secure secrets system or traditional environment variables.

# Initialize the secrets system
python manage_secrets.py init

# Set required secrets
python manage_secrets.py set TTS_API_KEY
python manage_secrets.py set TTS_SERVER_URL
python manage_secrets.py set ADMIN_TOKEN

# List all secrets
python manage_secrets.py list

# Rotate encryption keys
python manage_secrets.py rotate

Using Environment Variables

Create a .env file:

# Core Configuration
TTS_API_KEY=your-api-key-here
TTS_SERVER_URL=http://localhost:5050/v1/audio/speech
ADMIN_TOKEN=your-secure-admin-token

# CORS Configuration
CORS_ORIGINS=https://yourdomain.com,https://app.yourdomain.com
ADMIN_CORS_ORIGINS=https://admin.yourdomain.com

# Security Settings
SECRET_KEY=your-secret-key-here
MAX_CONTENT_LENGTH=52428800  # 50MB
SESSION_LIFETIME=3600  # 1 hour
RATE_LIMIT_STORAGE_URL=redis://localhost:6379/0

# Performance Tuning
WHISPER_MODEL_SIZE=base
GPU_MEMORY_THRESHOLD_MB=2048
MEMORY_CLEANUP_INTERVAL=30

Advanced Configuration

CORS Settings

# Development (allow all origins)
export CORS_ORIGINS="*"

# Production (restrict to specific domains)
export CORS_ORIGINS="https://yourdomain.com,https://app.yourdomain.com"
export ADMIN_CORS_ORIGINS="https://admin.yourdomain.com"

Rate Limiting

Configure per-endpoint rate limits:

# In your config or via admin API
RATE_LIMITS = {
    'default': {'requests_per_minute': 30, 'requests_per_hour': 500},
    'transcribe': {'requests_per_minute': 10, 'requests_per_hour': 100},
    'translate': {'requests_per_minute': 20, 'requests_per_hour': 300}
}

Session Management

SESSION_CONFIG = {
    'max_file_size_mb': 100,
    'max_files_per_session': 100,
    'idle_timeout_minutes': 15,
    'max_lifetime_minutes': 60
}

Security Features

1. Rate Limiting

Comprehensive DoS protection with:

  • Token bucket algorithm with sliding window
  • Per-endpoint configurable limits
  • Automatic IP blocking for abusive clients
  • Request size validation
# Check rate limit status
curl -H "X-Admin-Token: $ADMIN_TOKEN" http://localhost:5005/admin/rate-limits

# Block an IP
curl -X POST -H "X-Admin-Token: $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"ip": "192.168.1.100", "duration": 3600}' \
  http://localhost:5005/admin/block-ip

2. Secrets Management

  • AES-128 encryption for sensitive data
  • Automatic key rotation
  • Audit logging
  • Platform-specific secure storage
# View audit log
python manage_secrets.py audit

# Backup secrets
python manage_secrets.py export --output backup.enc

# Restore from backup
python manage_secrets.py import --input backup.enc

3. Session Management

  • Automatic resource tracking
  • Per-session limits (100 files, 100MB)
  • Idle session cleanup (15 minutes)
  • Real-time monitoring
# View active sessions
curl -H "X-Admin-Token: $ADMIN_TOKEN" http://localhost:5005/admin/sessions

# Clean up specific session
curl -X POST -H "X-Admin-Token: $ADMIN_TOKEN" \
  http://localhost:5005/admin/sessions/SESSION_ID/cleanup

4. Request Size Limits

  • Global limit: 50MB
  • Audio files: 25MB
  • JSON payloads: 1MB
  • Dynamic configuration
# Update size limits
curl -X POST -H "X-Admin-Token: $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"max_audio_size": "30MB"}' \
  http://localhost:5005/admin/size-limits

Production Deployment

Docker Deployment

# Build and run with Docker Compose
docker-compose up -d

# Scale web workers
docker-compose up -d --scale web=4

# View logs
docker-compose logs -f web

Docker Compose Configuration

version: '3.8'
services:
  web:
    build: .
    ports:
      - "5005:5005"
    environment:
      - GUNICORN_WORKERS=4
      - GUNICORN_THREADS=2
    volumes:
      - ./logs:/app/logs
      - whisper-cache:/root/.cache/whisper
    deploy:
      resources:
        limits:
          memory: 4G
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

Nginx Configuration

upstream talk2me {
    least_conn;
    server web1:5005 weight=1 max_fails=3 fail_timeout=30s;
    server web2:5005 weight=1 max_fails=3 fail_timeout=30s;
}

server {
    listen 443 ssl http2;
    server_name talk2me.yourdomain.com;
    
    ssl_certificate /etc/ssl/certs/talk2me.crt;
    ssl_certificate_key /etc/ssl/private/talk2me.key;
    
    client_max_body_size 50M;
    
    location / {
        proxy_pass http://talk2me;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $host;
        
        # WebSocket support
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
    
    # Cache static assets
    location /static/ {
        alias /app/static/;
        expires 30d;
        add_header Cache-Control "public, immutable";
    }
}

Systemd Service

[Unit]
Description=Talk2Me Translation Service
After=network.target

[Service]
Type=notify
User=talk2me
Group=talk2me
WorkingDirectory=/opt/talk2me
Environment="PATH=/opt/talk2me/venv/bin"
ExecStart=/opt/talk2me/venv/bin/gunicorn \
    --config gunicorn_config.py \
    --bind 0.0.0.0:5005 \
    app:app
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

API Documentation

Core Endpoints

Transcribe Audio

POST /transcribe
Content-Type: multipart/form-data

audio: (binary)
source_lang: auto|language_code

Translate Text

POST /translate
Content-Type: application/json

{
  "text": "Hello world",
  "source_lang": "English",
  "target_lang": "Spanish"
}

Streaming Translation

POST /translate/stream
Content-Type: application/json

{
  "text": "Long text to translate",
  "source_lang": "auto",
  "target_lang": "French"
}

Response: Server-Sent Events stream

Text-to-Speech

POST /speak
Content-Type: application/json

{
  "text": "Hola mundo",
  "language": "Spanish"
}

Admin Endpoints

All admin endpoints require X-Admin-Token header.

Health & Monitoring

  • GET /health - Basic health check
  • GET /health/detailed - Component status
  • GET /metrics - Prometheus metrics
  • GET /admin/memory - Memory usage stats

Session Management

  • GET /admin/sessions - List active sessions
  • GET /admin/sessions/:id - Session details
  • POST /admin/sessions/:id/cleanup - Manual cleanup

Security Controls

  • GET /admin/rate-limits - View rate limits
  • POST /admin/block-ip - Block IP address
  • GET /admin/logs/security - Security events

Development

TypeScript Development

# Install dependencies
npm install

# Development mode with auto-compilation
npm run dev

# Build for production
npm run build

# Type checking
npm run typecheck

Project Structure

talk2me/
├── app.py                 # Main Flask application
├── config.py             # Configuration management
├── requirements.txt      # Python dependencies
├── package.json         # Node.js dependencies
├── tsconfig.json        # TypeScript configuration
├── gunicorn_config.py   # Production server config
├── docker-compose.yml   # Container orchestration
├── static/
│   ├── js/
│   │   ├── src/        # TypeScript source files
│   │   └── dist/       # Compiled JavaScript
│   ├── css/            # Stylesheets
│   └── icons/          # PWA icons
├── templates/          # HTML templates
├── logs/              # Application logs
└── tests/             # Test suite

Key Components

  1. Connection Management (connectionManager.ts)

    • Automatic retry with exponential backoff
    • Request queuing during offline periods
    • Connection status monitoring
  2. Translation Cache (translationCache.ts)

    • IndexedDB for offline support
    • LRU eviction policy
    • Automatic cache size management
  3. Speaker Management (speakerManager.ts)

    • Multi-speaker conversation tracking
    • Speaker-specific audio handling
    • Conversation export functionality
  4. Error Handling (errorBoundary.ts)

    • Global error catching
    • Automatic error reporting
    • User-friendly error messages

Running Tests

# Python tests
pytest tests/ -v

# TypeScript tests
npm test

# Integration tests
python test_integration.py

Monitoring & Operations

Logging System

Talk2Me uses structured JSON logging with multiple streams:

logs/
├── talk2me.log      # General application log
├── errors.log       # Error-specific log
├── access.log       # HTTP access log
├── security.log     # Security events
└── performance.log  # Performance metrics

View logs:

# Recent errors
tail -f logs/errors.log | jq '.'

# Security events
grep "rate_limit_exceeded" logs/security.log | jq '.'

# Slow requests
jq 'select(.extra_fields.duration_ms > 1000)' logs/performance.log

Memory Management

Talk2Me includes comprehensive memory leak prevention:

  1. Backend Memory Management

    • GPU memory monitoring
    • Automatic model reloading
    • Temporary file cleanup
  2. Frontend Memory Management

    • Audio blob cleanup
    • WebRTC resource management
    • Event listener cleanup

Monitor memory:

# Check memory stats
curl -H "X-Admin-Token: $ADMIN_TOKEN" http://localhost:5005/admin/memory

# Trigger manual cleanup
curl -X POST -H "X-Admin-Token: $ADMIN_TOKEN" \
  http://localhost:5005/admin/memory/cleanup

Performance Tuning

GPU Optimization

# config.py or environment
GPU_OPTIMIZATIONS = {
    'enabled': True,
    'fp16': True,           # Half precision for 2x speedup
    'batch_size': 1,        # Adjust based on GPU memory
    'num_workers': 2,       # Parallel data loading
    'pin_memory': True      # Faster GPU transfer
}

Whisper Optimization

TRANSCRIBE_OPTIONS = {
    'beam_size': 1,         # Faster inference
    'best_of': 1,           # Disable multiple attempts
    'temperature': 0,       # Deterministic output
    'compression_ratio_threshold': 2.4,
    'logprob_threshold': -1.0,
    'no_speech_threshold': 0.6
}

Scaling Considerations

  1. Horizontal Scaling

    • Use Redis for shared rate limiting
    • Configure sticky sessions for WebSocket
    • Share audio files via object storage
  2. Vertical Scaling

    • Increase worker processes
    • Tune thread pool size
    • Allocate more GPU memory
  3. Caching Strategy

    • Cache translations in Redis
    • Use CDN for static assets
    • Enable HTTP caching headers

Troubleshooting

Common Issues

GPU Not Detected

# Check CUDA availability
python -c "import torch; print(torch.cuda.is_available())"

# Check GPU memory
nvidia-smi

# For AMD GPUs
rocm-smi

# For Apple Silicon
python -c "import torch; print(torch.backends.mps.is_available())"

High Memory Usage

# Check for memory leaks
curl -H "X-Admin-Token: $ADMIN_TOKEN" http://localhost:5005/health/storage

# Manual cleanup
curl -X POST -H "X-Admin-Token: $ADMIN_TOKEN" \
  http://localhost:5005/admin/cleanup

CORS Issues

# Test CORS configuration
curl -X OPTIONS http://localhost:5005/api/transcribe \
  -H "Origin: https://yourdomain.com" \
  -H "Access-Control-Request-Method: POST"

TTS Server Connection

# Check TTS server status
curl http://localhost:5005/check_tts_server

# Update TTS configuration
curl -X POST http://localhost:5005/update_tts_config \
  -H "Content-Type: application/json" \
  -d '{"server_url": "http://localhost:5050/v1/audio/speech", "api_key": "new-key"}'

Debug Mode

Enable debug logging:

export FLASK_ENV=development
export LOG_LEVEL=DEBUG
python app.py

Performance Profiling

# Enable performance logging
export ENABLE_PROFILING=true

# View slow requests
jq 'select(.duration_ms > 1000)' logs/performance.log

Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Development Setup

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Run tests (pytest && npm test)
  5. Commit your changes (git commit -m 'Add amazing feature')
  6. Push to the branch (git push origin feature/amazing-feature)
  7. Open a Pull Request

Code Style

  • Python: Follow PEP 8
  • TypeScript: Use ESLint configuration
  • Commit messages: Use conventional commits

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • OpenAI Whisper team for the amazing speech recognition model
  • Ollama team for making LLMs accessible
  • All contributors who have helped improve Talk2Me

Support