This commit introduces major enhancements to Talk2Me: ## Database Integration - PostgreSQL support with SQLAlchemy ORM - Redis integration for caching and real-time analytics - Automated database initialization scripts - Migration support infrastructure ## User Authentication System - JWT-based API authentication - Session-based web authentication - API key authentication for programmatic access - User roles and permissions (admin/user) - Login history and session tracking - Rate limiting per user with customizable limits ## Admin Dashboard - Real-time analytics and monitoring - User management interface (create, edit, delete users) - System health monitoring - Request/error tracking - Language pair usage statistics - Performance metrics visualization ## Key Features - Dual authentication support (token + user accounts) - Graceful fallback for missing services - Non-blocking analytics middleware - Comprehensive error handling - Session management with security features ## Bug Fixes - Fixed rate limiting bypass for admin routes - Added missing email validation method - Improved error handling for missing database tables - Fixed session-based authentication for API endpoints 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
20 KiB
Talk2Me - Real-Time Voice Language Translator
A production-ready, mobile-friendly web application that provides real-time translation of spoken language between multiple languages.
Features
- Real-time Speech Recognition: Powered by OpenAI Whisper with GPU acceleration
- Advanced Translation: Using Gemma 3 open-source LLM via Ollama
- Natural Text-to-Speech: OpenAI Edge TTS for lifelike voice output
- Progressive Web App: Full offline support with service workers
- Multi-Speaker Support: Track and translate conversations with multiple participants
- Enterprise Security: Comprehensive rate limiting, session management, and encrypted secrets
- Production Ready: Docker support, load balancing, and extensive monitoring
- Admin Dashboard: Real-time analytics, performance monitoring, and system health tracking
Table of Contents
- Supported Languages
- Quick Start
- Installation
- Configuration
- Security Features
- Production Deployment
- API Documentation
- Development
- Monitoring & Operations
- Troubleshooting
- Contributing
Supported Languages
- Arabic
- Armenian
- Azerbaijani
- English
- French
- Georgian
- Kazakh
- Mandarin
- Farsi
- Portuguese
- Russian
- Spanish
- Turkish
- Uzbek
Quick Start
# Clone the repository
git clone https://github.com/yourusername/talk2me.git
cd talk2me
# Install dependencies
pip install -r requirements.txt
npm install
# Initialize secure configuration
python manage_secrets.py init
python manage_secrets.py set TTS_API_KEY your-api-key-here
# Ensure Ollama is running with Gemma
ollama pull gemma2:9b
ollama pull gemma3:27b
# Start the application
python app.py
Open your browser and navigate to http://localhost:5005
Installation
Prerequisites
- Python 3.8+
- Node.js 14+
- Ollama (for LLM translation)
- OpenAI Edge TTS server
- Optional: NVIDIA GPU with CUDA, AMD GPU with ROCm, or Apple Silicon
Detailed Setup
-
Install Python dependencies:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirements.txt
-
Install Node.js dependencies:
npm install npm run build # Build TypeScript files
-
Configure GPU Support (Optional):
# For NVIDIA GPUs pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 # For AMD GPUs (ROCm) pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2 # For Apple Silicon pip install torch torchvision torchaudio
-
Set up Ollama:
# Install Ollama (https://ollama.ai) curl -fsSL https://ollama.ai/install.sh | sh # Pull required models ollama pull gemma2:9b # Faster, for streaming ollama pull gemma3:27b # Better quality
-
Configure TTS Server: Ensure your OpenAI Edge TTS server is running. Default expected at
http://localhost:5050
Configuration
Environment Variables
Talk2Me uses encrypted secrets management for sensitive configuration. You can use either the secure secrets system or traditional environment variables.
Using Secure Secrets Management (Recommended)
# Initialize the secrets system
python manage_secrets.py init
# Set required secrets
python manage_secrets.py set TTS_API_KEY
python manage_secrets.py set TTS_SERVER_URL
python manage_secrets.py set ADMIN_TOKEN
# List all secrets
python manage_secrets.py list
# Rotate encryption keys
python manage_secrets.py rotate
Using Environment Variables
Create a .env
file:
# Core Configuration
TTS_API_KEY=your-api-key-here
TTS_SERVER_URL=http://localhost:5050/v1/audio/speech
ADMIN_TOKEN=your-secure-admin-token
# CORS Configuration
CORS_ORIGINS=https://yourdomain.com,https://app.yourdomain.com
ADMIN_CORS_ORIGINS=https://admin.yourdomain.com
# Security Settings
SECRET_KEY=your-secret-key-here
MAX_CONTENT_LENGTH=52428800 # 50MB
SESSION_LIFETIME=3600 # 1 hour
RATE_LIMIT_STORAGE_URL=redis://localhost:6379/0
# Performance Tuning
WHISPER_MODEL_SIZE=base
GPU_MEMORY_THRESHOLD_MB=2048
MEMORY_CLEANUP_INTERVAL=30
Advanced Configuration
CORS Settings
# Development (allow all origins)
export CORS_ORIGINS="*"
# Production (restrict to specific domains)
export CORS_ORIGINS="https://yourdomain.com,https://app.yourdomain.com"
export ADMIN_CORS_ORIGINS="https://admin.yourdomain.com"
Rate Limiting
Configure per-endpoint rate limits:
# In your config or via admin API
RATE_LIMITS = {
'default': {'requests_per_minute': 30, 'requests_per_hour': 500},
'transcribe': {'requests_per_minute': 10, 'requests_per_hour': 100},
'translate': {'requests_per_minute': 20, 'requests_per_hour': 300}
}
Session Management
SESSION_CONFIG = {
'max_file_size_mb': 100,
'max_files_per_session': 100,
'idle_timeout_minutes': 15,
'max_lifetime_minutes': 60
}
Security Features
1. Rate Limiting
Comprehensive DoS protection with:
- Token bucket algorithm with sliding window
- Per-endpoint configurable limits
- Automatic IP blocking for abusive clients
- Request size validation
# Check rate limit status
curl -H "X-Admin-Token: $ADMIN_TOKEN" http://localhost:5005/admin/rate-limits
# Block an IP
curl -X POST -H "X-Admin-Token: $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"ip": "192.168.1.100", "duration": 3600}' \
http://localhost:5005/admin/block-ip
2. Secrets Management
- AES-128 encryption for sensitive data
- Automatic key rotation
- Audit logging
- Platform-specific secure storage
# View audit log
python manage_secrets.py audit
# Backup secrets
python manage_secrets.py export --output backup.enc
# Restore from backup
python manage_secrets.py import --input backup.enc
3. Session Management
- Automatic resource tracking
- Per-session limits (100 files, 100MB)
- Idle session cleanup (15 minutes)
- Real-time monitoring
# View active sessions
curl -H "X-Admin-Token: $ADMIN_TOKEN" http://localhost:5005/admin/sessions
# Clean up specific session
curl -X POST -H "X-Admin-Token: $ADMIN_TOKEN" \
http://localhost:5005/admin/sessions/SESSION_ID/cleanup
4. Request Size Limits
- Global limit: 50MB
- Audio files: 25MB
- JSON payloads: 1MB
- Dynamic configuration
# Update size limits
curl -X POST -H "X-Admin-Token: $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"max_audio_size": "30MB"}' \
http://localhost:5005/admin/size-limits
Production Deployment
Docker Deployment
# Build and run with Docker Compose (CPU only)
docker-compose up -d
# With NVIDIA GPU support
docker-compose -f docker-compose.yml -f docker-compose.nvidia.yml up -d
# With AMD GPU support (ROCm)
docker-compose -f docker-compose.yml -f docker-compose.amd.yml up -d
# With Apple Silicon support
docker-compose -f docker-compose.yml -f docker-compose.apple.yml up -d
# Scale web workers
docker-compose up -d --scale talk2me=4
# View logs
docker-compose logs -f talk2me
Docker Compose Configuration
Choose the appropriate configuration based on your GPU:
NVIDIA GPU Configuration
version: '3.8'
services:
web:
build: .
ports:
- "5005:5005"
environment:
- GUNICORN_WORKERS=4
- GUNICORN_THREADS=2
volumes:
- ./logs:/app/logs
- whisper-cache:/root/.cache/whisper
deploy:
resources:
limits:
memory: 4G
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
AMD GPU Configuration (ROCm)
version: '3.8'
services:
web:
build: .
ports:
- "5005:5005"
environment:
- GUNICORN_WORKERS=4
- GUNICORN_THREADS=2
- HSA_OVERRIDE_GFX_VERSION=10.3.0 # Adjust for your GPU
volumes:
- ./logs:/app/logs
- whisper-cache:/root/.cache/whisper
- /dev/kfd:/dev/kfd # ROCm KFD interface
- /dev/dri:/dev/dri # Direct Rendering Interface
devices:
- /dev/kfd
- /dev/dri
group_add:
- video
- render
deploy:
resources:
limits:
memory: 4G
Apple Silicon Configuration
version: '3.8'
services:
web:
build: .
platform: linux/arm64/v8 # For M1/M2 Macs
ports:
- "5005:5005"
environment:
- GUNICORN_WORKERS=4
- GUNICORN_THREADS=2
- PYTORCH_ENABLE_MPS_FALLBACK=1 # Enable MPS fallback
volumes:
- ./logs:/app/logs
- whisper-cache:/root/.cache/whisper
deploy:
resources:
limits:
memory: 4G
CPU-Only Configuration
version: '3.8'
services:
web:
build: .
ports:
- "5005:5005"
environment:
- GUNICORN_WORKERS=4
- GUNICORN_THREADS=2
- OMP_NUM_THREADS=4 # OpenMP threads for CPU
volumes:
- ./logs:/app/logs
- whisper-cache:/root/.cache/whisper
deploy:
resources:
limits:
memory: 4G
cpus: '4.0'
Nginx Configuration
upstream talk2me {
least_conn;
server web1:5005 weight=1 max_fails=3 fail_timeout=30s;
server web2:5005 weight=1 max_fails=3 fail_timeout=30s;
}
server {
listen 443 ssl http2;
server_name talk2me.yourdomain.com;
ssl_certificate /etc/ssl/certs/talk2me.crt;
ssl_certificate_key /etc/ssl/private/talk2me.key;
client_max_body_size 50M;
location / {
proxy_pass http://talk2me;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $host;
# WebSocket support
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
# Cache static assets
location /static/ {
alias /app/static/;
expires 30d;
add_header Cache-Control "public, immutable";
}
}
Systemd Service
[Unit]
Description=Talk2Me Translation Service
After=network.target
[Service]
Type=notify
User=talk2me
Group=talk2me
WorkingDirectory=/opt/talk2me
Environment="PATH=/opt/talk2me/venv/bin"
ExecStart=/opt/talk2me/venv/bin/gunicorn \
--config gunicorn_config.py \
--bind 0.0.0.0:5005 \
app:app
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
API Documentation
Core Endpoints
Transcribe Audio
POST /transcribe
Content-Type: multipart/form-data
audio: (binary)
source_lang: auto|language_code
Translate Text
POST /translate
Content-Type: application/json
{
"text": "Hello world",
"source_lang": "English",
"target_lang": "Spanish"
}
Streaming Translation
POST /translate/stream
Content-Type: application/json
{
"text": "Long text to translate",
"source_lang": "auto",
"target_lang": "French"
}
Response: Server-Sent Events stream
Text-to-Speech
POST /speak
Content-Type: application/json
{
"text": "Hola mundo",
"language": "Spanish"
}
Admin Endpoints
All admin endpoints require X-Admin-Token
header.
Health & Monitoring
GET /health
- Basic health checkGET /health/detailed
- Component statusGET /metrics
- Prometheus metricsGET /admin/memory
- Memory usage stats
Session Management
GET /admin/sessions
- List active sessionsGET /admin/sessions/:id
- Session detailsPOST /admin/sessions/:id/cleanup
- Manual cleanup
Security Controls
GET /admin/rate-limits
- View rate limitsPOST /admin/block-ip
- Block IP addressGET /admin/logs/security
- Security events
Admin Dashboard
Talk2Me includes a comprehensive admin analytics dashboard for monitoring and managing the application.
Features
- Real-time Analytics: Monitor requests, active sessions, and error rates
- Performance Metrics: Track response times, throughput, and resource usage
- System Health: Monitor Redis, PostgreSQL, and ML services status
- Language Analytics: View popular language pairs and usage patterns
- Error Analysis: Detailed error tracking with types and trends
- Data Export: Download analytics data in JSON format
Setup
-
Initialize Database:
python init_analytics_db.py
-
Configure Admin Token:
export ADMIN_TOKEN="your-secure-admin-token"
-
Access Dashboard:
- Navigate to
https://yourdomain.com/admin
- Enter your admin token
- View real-time analytics
- Navigate to
Dashboard Sections
- Overview Cards: Key metrics at a glance
- Request Volume: Visualize traffic patterns
- Operations: Translation and transcription statistics
- Performance: Response time percentiles (P95, P99)
- Error Tracking: Error types and recent issues
- System Health: Component status monitoring
For detailed admin documentation, see ADMIN_DASHBOARD.md.
Development
TypeScript Development
# Install dependencies
npm install
# Development mode with auto-compilation
npm run dev
# Build for production
npm run build
# Type checking
npm run typecheck
Project Structure
talk2me/
├── app.py # Main Flask application
├── config.py # Configuration management
├── requirements.txt # Python dependencies
├── package.json # Node.js dependencies
├── tsconfig.json # TypeScript configuration
├── gunicorn_config.py # Production server config
├── docker-compose.yml # Container orchestration
├── static/
│ ├── js/
│ │ ├── src/ # TypeScript source files
│ │ └── dist/ # Compiled JavaScript
│ ├── css/ # Stylesheets
│ └── icons/ # PWA icons
├── templates/ # HTML templates
├── logs/ # Application logs
└── tests/ # Test suite
Key Components
-
Connection Management (
connectionManager.ts
)- Automatic retry with exponential backoff
- Request queuing during offline periods
- Connection status monitoring
-
Translation Cache (
translationCache.ts
)- IndexedDB for offline support
- LRU eviction policy
- Automatic cache size management
-
Speaker Management (
speakerManager.ts
)- Multi-speaker conversation tracking
- Speaker-specific audio handling
- Conversation export functionality
-
Error Handling (
errorBoundary.ts
)- Global error catching
- Automatic error reporting
- User-friendly error messages
Running Tests
# Python tests
pytest tests/ -v
# TypeScript tests
npm test
# Integration tests
python test_integration.py
Monitoring & Operations
Logging System
Talk2Me uses structured JSON logging with multiple streams:
logs/
├── talk2me.log # General application log
├── errors.log # Error-specific log
├── access.log # HTTP access log
├── security.log # Security events
└── performance.log # Performance metrics
View logs:
# Recent errors
tail -f logs/errors.log | jq '.'
# Security events
grep "rate_limit_exceeded" logs/security.log | jq '.'
# Slow requests
jq 'select(.extra_fields.duration_ms > 1000)' logs/performance.log
Memory Management
Talk2Me includes comprehensive memory leak prevention:
-
Backend Memory Management
- GPU memory monitoring
- Automatic model reloading
- Temporary file cleanup
-
Frontend Memory Management
- Audio blob cleanup
- WebRTC resource management
- Event listener cleanup
Monitor memory:
# Check memory stats
curl -H "X-Admin-Token: $ADMIN_TOKEN" http://localhost:5005/admin/memory
# Trigger manual cleanup
curl -X POST -H "X-Admin-Token: $ADMIN_TOKEN" \
http://localhost:5005/admin/memory/cleanup
Performance Tuning
GPU Optimization
# config.py or environment
GPU_OPTIMIZATIONS = {
'enabled': True,
'fp16': True, # Half precision for 2x speedup
'batch_size': 1, # Adjust based on GPU memory
'num_workers': 2, # Parallel data loading
'pin_memory': True # Faster GPU transfer
}
Whisper Optimization
TRANSCRIBE_OPTIONS = {
'beam_size': 1, # Faster inference
'best_of': 1, # Disable multiple attempts
'temperature': 0, # Deterministic output
'compression_ratio_threshold': 2.4,
'logprob_threshold': -1.0,
'no_speech_threshold': 0.6
}
Scaling Considerations
-
Horizontal Scaling
- Use Redis for shared rate limiting
- Configure sticky sessions for WebSocket
- Share audio files via object storage
-
Vertical Scaling
- Increase worker processes
- Tune thread pool size
- Allocate more GPU memory
-
Caching Strategy
- Cache translations in Redis
- Use CDN for static assets
- Enable HTTP caching headers
Troubleshooting
Common Issues
GPU Not Detected
# Check CUDA availability
python -c "import torch; print(torch.cuda.is_available())"
# Check GPU memory
nvidia-smi
# For AMD GPUs
rocm-smi
# For Apple Silicon
python -c "import torch; print(torch.backends.mps.is_available())"
High Memory Usage
# Check for memory leaks
curl -H "X-Admin-Token: $ADMIN_TOKEN" http://localhost:5005/health/storage
# Manual cleanup
curl -X POST -H "X-Admin-Token: $ADMIN_TOKEN" \
http://localhost:5005/admin/cleanup
CORS Issues
# Test CORS configuration
curl -X OPTIONS http://localhost:5005/api/transcribe \
-H "Origin: https://yourdomain.com" \
-H "Access-Control-Request-Method: POST"
TTS Server Connection
# Check TTS server status
curl http://localhost:5005/check_tts_server
# Update TTS configuration
curl -X POST http://localhost:5005/update_tts_config \
-H "Content-Type: application/json" \
-d '{"server_url": "http://localhost:5050/v1/audio/speech", "api_key": "new-key"}'
Debug Mode
Enable debug logging:
export FLASK_ENV=development
export LOG_LEVEL=DEBUG
python app.py
Performance Profiling
# Enable performance logging
export ENABLE_PROFILING=true
# View slow requests
jq 'select(.duration_ms > 1000)' logs/performance.log
Contributing
We welcome contributions! Please see our Contributing Guidelines for details.
Development Setup
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Make your changes
- Run tests (
pytest && npm test
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
Code Style
- Python: Follow PEP 8
- TypeScript: Use ESLint configuration
- Commit messages: Use conventional commits
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- OpenAI Whisper team for the amazing speech recognition model
- Ollama team for making LLMs accessible
- All contributors who have helped improve Talk2Me
Support
- Documentation: Full docs at docs.talk2me.app
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Security: Please report security vulnerabilities to security@talk2me.app