talk2me/PRODUCTION_DEPLOYMENT.md
Adolfo Delorenzo 92fd390866 Add production WSGI server - Flask dev server unsuitable for production load
This adds a complete production deployment setup using Gunicorn as the WSGI server, replacing Flask's development server.

Key components:
- Gunicorn configuration with optimized worker settings
- Support for sync, threaded, and async (gevent) workers
- Automatic worker recycling to prevent memory leaks
- Increased timeouts for audio processing
- Production-ready logging and monitoring

Deployment options:
1. Docker/Docker Compose for containerized deployment
2. Systemd service for traditional deployment
3. Nginx reverse proxy configuration
4. SSL/TLS support

Production features:
- wsgi.py entry point for WSGI servers
- gunicorn_config.py with production settings
- Dockerfile with multi-stage build
- docker-compose.yml with full stack (Redis, PostgreSQL)
- nginx.conf with caching and security headers
- systemd service with security hardening
- deploy.sh automated deployment script

Configuration:
- .env.production template with all settings
- Support for environment-based configuration
- Separate requirements-prod.txt
- Prometheus metrics endpoint (/metrics)

Monitoring:
- Health check endpoints for liveness/readiness
- Prometheus-compatible metrics
- Structured logging
- Memory usage tracking
- Request counting

Security:
- Non-root user in Docker
- Systemd security restrictions
- Nginx security headers
- File permission hardening
- Resource limits

Documentation:
- Comprehensive PRODUCTION_DEPLOYMENT.md
- Scaling strategies
- Performance tuning guide
- Troubleshooting section

Also fixed memory_manager.py GC stats collection error.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-03 08:49:32 -06:00

7.6 KiB

Production Deployment Guide

This guide covers deploying Talk2Me in a production environment using a proper WSGI server.

Overview

The Flask development server is not suitable for production use. This guide covers:

  • Gunicorn as the WSGI server
  • Nginx as a reverse proxy
  • Docker for containerization
  • Systemd for process management
  • Security best practices

Quick Start with Docker

1. Using Docker Compose

# Clone the repository
git clone https://github.com/your-repo/talk2me.git
cd talk2me

# Create .env file with production settings
cat > .env <<EOF
TTS_API_KEY=your-api-key
ADMIN_TOKEN=your-secure-admin-token
SECRET_KEY=your-secure-secret-key
POSTGRES_PASSWORD=your-secure-db-password
EOF

# Build and start services
docker-compose up -d

# Check status
docker-compose ps
docker-compose logs -f talk2me

2. Using Docker (standalone)

# Build the image
docker build -t talk2me .

# Run the container
docker run -d \
  --name talk2me \
  -p 5005:5005 \
  -e TTS_API_KEY=your-api-key \
  -e ADMIN_TOKEN=your-secure-token \
  -e SECRET_KEY=your-secure-key \
  -v $(pwd)/logs:/app/logs \
  talk2me

Manual Deployment

1. System Requirements

  • Ubuntu 20.04+ or similar Linux distribution
  • Python 3.8+
  • Nginx
  • Systemd
  • 4GB+ RAM recommended
  • GPU (optional, for faster transcription)

2. Installation

Run the deployment script as root:

sudo ./deploy.sh

Or manually:

# Install system dependencies
sudo apt-get update
sudo apt-get install -y python3-pip python3-venv nginx

# Create application user
sudo useradd -m -s /bin/bash talk2me

# Create directories
sudo mkdir -p /opt/talk2me /var/log/talk2me
sudo chown talk2me:talk2me /opt/talk2me /var/log/talk2me

# Copy application files
sudo cp -r . /opt/talk2me/
sudo chown -R talk2me:talk2me /opt/talk2me

# Install Python dependencies
sudo -u talk2me python3 -m venv /opt/talk2me/venv
sudo -u talk2me /opt/talk2me/venv/bin/pip install -r requirements-prod.txt

# Configure and start services
sudo cp talk2me.service /etc/systemd/system/
sudo systemctl enable talk2me
sudo systemctl start talk2me

Gunicorn Configuration

The gunicorn_config.py file contains production-ready settings:

Worker Configuration

# Number of worker processes
workers = multiprocessing.cpu_count() * 2 + 1

# Worker timeout (increased for audio processing)
timeout = 120

# Restart workers periodically to prevent memory leaks
max_requests = 1000
max_requests_jitter = 50

Performance Tuning

For different workloads:

# CPU-bound (transcription heavy)
export GUNICORN_WORKERS=8
export GUNICORN_THREADS=1

# I/O-bound (many concurrent requests)
export GUNICORN_WORKERS=4
export GUNICORN_THREADS=4
export GUNICORN_WORKER_CLASS=gthread

# Async (best concurrency)
export GUNICORN_WORKER_CLASS=gevent
export GUNICORN_WORKER_CONNECTIONS=1000

Nginx Configuration

Basic Setup

The provided nginx.conf includes:

  • Reverse proxy to Gunicorn
  • Static file serving
  • WebSocket support
  • Security headers
  • Gzip compression

SSL/TLS Setup

server {
    listen 443 ssl http2;
    server_name your-domain.com;
    
    ssl_certificate /etc/letsencrypt/live/your-domain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/your-domain.com/privkey.pem;
    
    # Strong SSL configuration
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;
    ssl_prefer_server_ciphers off;
    
    # HSTS
    add_header Strict-Transport-Security "max-age=63072000" always;
}

Environment Variables

Required

# Security
SECRET_KEY=your-very-secure-secret-key
ADMIN_TOKEN=your-admin-api-token

# TTS Configuration
TTS_API_KEY=your-tts-api-key
TTS_SERVER_URL=http://your-tts-server:5050/v1/audio/speech

# Flask
FLASK_ENV=production

Optional

# Performance
GUNICORN_WORKERS=4
GUNICORN_THREADS=2
MEMORY_THRESHOLD_MB=4096
GPU_MEMORY_THRESHOLD_MB=2048

# Database (for session storage)
DATABASE_URL=postgresql://user:pass@localhost/talk2me
REDIS_URL=redis://localhost:6379/0

# Monitoring
SENTRY_DSN=your-sentry-dsn

Monitoring

Health Checks

# Basic health check
curl http://localhost:5005/health

# Detailed health check
curl http://localhost:5005/health/detailed

# Memory usage
curl -H "X-Admin-Token: your-token" http://localhost:5005/admin/memory

Logs

# Application logs
tail -f /var/log/talk2me/talk2me.log

# Error logs
tail -f /var/log/talk2me/errors.log

# Gunicorn logs
journalctl -u talk2me -f

# Nginx logs
tail -f /var/log/nginx/access.log
tail -f /var/log/nginx/error.log

Metrics

With Prometheus client installed:

# Prometheus metrics endpoint
curl http://localhost:5005/metrics

Scaling

Horizontal Scaling

For multiple servers:

  1. Use Redis for session storage
  2. Use PostgreSQL for persistent data
  3. Load balance with Nginx:
upstream talk2me_backends {
    least_conn;
    server server1:5005 weight=1;
    server server2:5005 weight=1;
    server server3:5005 weight=1;
}

Vertical Scaling

Adjust based on load:

# High memory usage
MEMORY_THRESHOLD_MB=8192
GPU_MEMORY_THRESHOLD_MB=4096

# More workers
GUNICORN_WORKERS=16
GUNICORN_THREADS=4

# Larger file limits
client_max_body_size 100M;

Security

Firewall

# Allow only necessary ports
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw allow 22/tcp
sudo ufw enable

File Permissions

# Secure file permissions
sudo chmod 750 /opt/talk2me
sudo chmod 640 /opt/talk2me/.env
sudo chmod 755 /opt/talk2me/static

AppArmor/SELinux

Create security profiles to restrict application access.

Backup

Database Backup

# PostgreSQL
pg_dump talk2me > backup.sql

# Redis
redis-cli BGSAVE

Application Backup

# Backup application and logs
tar -czf talk2me-backup.tar.gz \
  /opt/talk2me \
  /var/log/talk2me \
  /etc/systemd/system/talk2me.service \
  /etc/nginx/sites-available/talk2me

Troubleshooting

Service Won't Start

# Check service status
systemctl status talk2me

# Check logs
journalctl -u talk2me -n 100

# Test configuration
sudo -u talk2me /opt/talk2me/venv/bin/gunicorn --check-config wsgi:application

High Memory Usage

# Trigger cleanup
curl -X POST -H "X-Admin-Token: token" http://localhost:5005/admin/memory/cleanup

# Restart workers
systemctl reload talk2me

Slow Response Times

  1. Check worker count
  2. Enable async workers
  3. Check GPU availability
  4. Review nginx buffering settings

Performance Optimization

1. Enable GPU

Ensure CUDA/ROCm is properly installed:

# Check GPU
nvidia-smi  # or rocm-smi

# Set in environment
export CUDA_VISIBLE_DEVICES=0

2. Optimize Workers

# For CPU-heavy workloads
workers = cpu_count()
threads = 1

# For I/O-heavy workloads
workers = cpu_count() * 2
threads = 4

3. Enable Caching

Use Redis for caching translations:

CACHE_TYPE = 'redis'
CACHE_REDIS_URL = 'redis://localhost:6379/0'

Maintenance

Regular Tasks

  1. Log Rotation: Configured automatically
  2. Database Cleanup: Run weekly
  3. Model Updates: Check for Whisper updates
  4. Security Updates: Keep dependencies updated

Update Procedure

# Backup first
./backup.sh

# Update code
git pull

# Update dependencies
sudo -u talk2me /opt/talk2me/venv/bin/pip install -r requirements-prod.txt

# Restart service
sudo systemctl restart talk2me

Rollback

If deployment fails:

# Stop service
sudo systemctl stop talk2me

# Restore backup
tar -xzf talk2me-backup.tar.gz -C /

# Restart service
sudo systemctl start talk2me