This adds a complete production deployment setup using Gunicorn as the WSGI server, replacing Flask's development server. Key components: - Gunicorn configuration with optimized worker settings - Support for sync, threaded, and async (gevent) workers - Automatic worker recycling to prevent memory leaks - Increased timeouts for audio processing - Production-ready logging and monitoring Deployment options: 1. Docker/Docker Compose for containerized deployment 2. Systemd service for traditional deployment 3. Nginx reverse proxy configuration 4. SSL/TLS support Production features: - wsgi.py entry point for WSGI servers - gunicorn_config.py with production settings - Dockerfile with multi-stage build - docker-compose.yml with full stack (Redis, PostgreSQL) - nginx.conf with caching and security headers - systemd service with security hardening - deploy.sh automated deployment script Configuration: - .env.production template with all settings - Support for environment-based configuration - Separate requirements-prod.txt - Prometheus metrics endpoint (/metrics) Monitoring: - Health check endpoints for liveness/readiness - Prometheus-compatible metrics - Structured logging - Memory usage tracking - Request counting Security: - Non-root user in Docker - Systemd security restrictions - Nginx security headers - File permission hardening - Resource limits Documentation: - Comprehensive PRODUCTION_DEPLOYMENT.md - Scaling strategies - Performance tuning guide - Troubleshooting section Also fixed memory_manager.py GC stats collection error. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
7.6 KiB
7.6 KiB
Production Deployment Guide
This guide covers deploying Talk2Me in a production environment using a proper WSGI server.
Overview
The Flask development server is not suitable for production use. This guide covers:
- Gunicorn as the WSGI server
- Nginx as a reverse proxy
- Docker for containerization
- Systemd for process management
- Security best practices
Quick Start with Docker
1. Using Docker Compose
# Clone the repository
git clone https://github.com/your-repo/talk2me.git
cd talk2me
# Create .env file with production settings
cat > .env <<EOF
TTS_API_KEY=your-api-key
ADMIN_TOKEN=your-secure-admin-token
SECRET_KEY=your-secure-secret-key
POSTGRES_PASSWORD=your-secure-db-password
EOF
# Build and start services
docker-compose up -d
# Check status
docker-compose ps
docker-compose logs -f talk2me
2. Using Docker (standalone)
# Build the image
docker build -t talk2me .
# Run the container
docker run -d \
--name talk2me \
-p 5005:5005 \
-e TTS_API_KEY=your-api-key \
-e ADMIN_TOKEN=your-secure-token \
-e SECRET_KEY=your-secure-key \
-v $(pwd)/logs:/app/logs \
talk2me
Manual Deployment
1. System Requirements
- Ubuntu 20.04+ or similar Linux distribution
- Python 3.8+
- Nginx
- Systemd
- 4GB+ RAM recommended
- GPU (optional, for faster transcription)
2. Installation
Run the deployment script as root:
sudo ./deploy.sh
Or manually:
# Install system dependencies
sudo apt-get update
sudo apt-get install -y python3-pip python3-venv nginx
# Create application user
sudo useradd -m -s /bin/bash talk2me
# Create directories
sudo mkdir -p /opt/talk2me /var/log/talk2me
sudo chown talk2me:talk2me /opt/talk2me /var/log/talk2me
# Copy application files
sudo cp -r . /opt/talk2me/
sudo chown -R talk2me:talk2me /opt/talk2me
# Install Python dependencies
sudo -u talk2me python3 -m venv /opt/talk2me/venv
sudo -u talk2me /opt/talk2me/venv/bin/pip install -r requirements-prod.txt
# Configure and start services
sudo cp talk2me.service /etc/systemd/system/
sudo systemctl enable talk2me
sudo systemctl start talk2me
Gunicorn Configuration
The gunicorn_config.py
file contains production-ready settings:
Worker Configuration
# Number of worker processes
workers = multiprocessing.cpu_count() * 2 + 1
# Worker timeout (increased for audio processing)
timeout = 120
# Restart workers periodically to prevent memory leaks
max_requests = 1000
max_requests_jitter = 50
Performance Tuning
For different workloads:
# CPU-bound (transcription heavy)
export GUNICORN_WORKERS=8
export GUNICORN_THREADS=1
# I/O-bound (many concurrent requests)
export GUNICORN_WORKERS=4
export GUNICORN_THREADS=4
export GUNICORN_WORKER_CLASS=gthread
# Async (best concurrency)
export GUNICORN_WORKER_CLASS=gevent
export GUNICORN_WORKER_CONNECTIONS=1000
Nginx Configuration
Basic Setup
The provided nginx.conf
includes:
- Reverse proxy to Gunicorn
- Static file serving
- WebSocket support
- Security headers
- Gzip compression
SSL/TLS Setup
server {
listen 443 ssl http2;
server_name your-domain.com;
ssl_certificate /etc/letsencrypt/live/your-domain.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/your-domain.com/privkey.pem;
# Strong SSL configuration
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;
ssl_prefer_server_ciphers off;
# HSTS
add_header Strict-Transport-Security "max-age=63072000" always;
}
Environment Variables
Required
# Security
SECRET_KEY=your-very-secure-secret-key
ADMIN_TOKEN=your-admin-api-token
# TTS Configuration
TTS_API_KEY=your-tts-api-key
TTS_SERVER_URL=http://your-tts-server:5050/v1/audio/speech
# Flask
FLASK_ENV=production
Optional
# Performance
GUNICORN_WORKERS=4
GUNICORN_THREADS=2
MEMORY_THRESHOLD_MB=4096
GPU_MEMORY_THRESHOLD_MB=2048
# Database (for session storage)
DATABASE_URL=postgresql://user:pass@localhost/talk2me
REDIS_URL=redis://localhost:6379/0
# Monitoring
SENTRY_DSN=your-sentry-dsn
Monitoring
Health Checks
# Basic health check
curl http://localhost:5005/health
# Detailed health check
curl http://localhost:5005/health/detailed
# Memory usage
curl -H "X-Admin-Token: your-token" http://localhost:5005/admin/memory
Logs
# Application logs
tail -f /var/log/talk2me/talk2me.log
# Error logs
tail -f /var/log/talk2me/errors.log
# Gunicorn logs
journalctl -u talk2me -f
# Nginx logs
tail -f /var/log/nginx/access.log
tail -f /var/log/nginx/error.log
Metrics
With Prometheus client installed:
# Prometheus metrics endpoint
curl http://localhost:5005/metrics
Scaling
Horizontal Scaling
For multiple servers:
- Use Redis for session storage
- Use PostgreSQL for persistent data
- Load balance with Nginx:
upstream talk2me_backends {
least_conn;
server server1:5005 weight=1;
server server2:5005 weight=1;
server server3:5005 weight=1;
}
Vertical Scaling
Adjust based on load:
# High memory usage
MEMORY_THRESHOLD_MB=8192
GPU_MEMORY_THRESHOLD_MB=4096
# More workers
GUNICORN_WORKERS=16
GUNICORN_THREADS=4
# Larger file limits
client_max_body_size 100M;
Security
Firewall
# Allow only necessary ports
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw allow 22/tcp
sudo ufw enable
File Permissions
# Secure file permissions
sudo chmod 750 /opt/talk2me
sudo chmod 640 /opt/talk2me/.env
sudo chmod 755 /opt/talk2me/static
AppArmor/SELinux
Create security profiles to restrict application access.
Backup
Database Backup
# PostgreSQL
pg_dump talk2me > backup.sql
# Redis
redis-cli BGSAVE
Application Backup
# Backup application and logs
tar -czf talk2me-backup.tar.gz \
/opt/talk2me \
/var/log/talk2me \
/etc/systemd/system/talk2me.service \
/etc/nginx/sites-available/talk2me
Troubleshooting
Service Won't Start
# Check service status
systemctl status talk2me
# Check logs
journalctl -u talk2me -n 100
# Test configuration
sudo -u talk2me /opt/talk2me/venv/bin/gunicorn --check-config wsgi:application
High Memory Usage
# Trigger cleanup
curl -X POST -H "X-Admin-Token: token" http://localhost:5005/admin/memory/cleanup
# Restart workers
systemctl reload talk2me
Slow Response Times
- Check worker count
- Enable async workers
- Check GPU availability
- Review nginx buffering settings
Performance Optimization
1. Enable GPU
Ensure CUDA/ROCm is properly installed:
# Check GPU
nvidia-smi # or rocm-smi
# Set in environment
export CUDA_VISIBLE_DEVICES=0
2. Optimize Workers
# For CPU-heavy workloads
workers = cpu_count()
threads = 1
# For I/O-heavy workloads
workers = cpu_count() * 2
threads = 4
3. Enable Caching
Use Redis for caching translations:
CACHE_TYPE = 'redis'
CACHE_REDIS_URL = 'redis://localhost:6379/0'
Maintenance
Regular Tasks
- Log Rotation: Configured automatically
- Database Cleanup: Run weekly
- Model Updates: Check for Whisper updates
- Security Updates: Keep dependencies updated
Update Procedure
# Backup first
./backup.sh
# Update code
git pull
# Update dependencies
sudo -u talk2me /opt/talk2me/venv/bin/pip install -r requirements-prod.txt
# Restart service
sudo systemctl restart talk2me
Rollback
If deployment fails:
# Stop service
sudo systemctl stop talk2me
# Restore backup
tar -xzf talk2me-backup.tar.gz -C /
# Restart service
sudo systemctl start talk2me