talk2me/PRODUCTION_DEPLOYMENT.md
Adolfo Delorenzo 92fd390866 Add production WSGI server - Flask dev server unsuitable for production load
This adds a complete production deployment setup using Gunicorn as the WSGI server, replacing Flask's development server.

Key components:
- Gunicorn configuration with optimized worker settings
- Support for sync, threaded, and async (gevent) workers
- Automatic worker recycling to prevent memory leaks
- Increased timeouts for audio processing
- Production-ready logging and monitoring

Deployment options:
1. Docker/Docker Compose for containerized deployment
2. Systemd service for traditional deployment
3. Nginx reverse proxy configuration
4. SSL/TLS support

Production features:
- wsgi.py entry point for WSGI servers
- gunicorn_config.py with production settings
- Dockerfile with multi-stage build
- docker-compose.yml with full stack (Redis, PostgreSQL)
- nginx.conf with caching and security headers
- systemd service with security hardening
- deploy.sh automated deployment script

Configuration:
- .env.production template with all settings
- Support for environment-based configuration
- Separate requirements-prod.txt
- Prometheus metrics endpoint (/metrics)

Monitoring:
- Health check endpoints for liveness/readiness
- Prometheus-compatible metrics
- Structured logging
- Memory usage tracking
- Request counting

Security:
- Non-root user in Docker
- Systemd security restrictions
- Nginx security headers
- File permission hardening
- Resource limits

Documentation:
- Comprehensive PRODUCTION_DEPLOYMENT.md
- Scaling strategies
- Performance tuning guide
- Troubleshooting section

Also fixed memory_manager.py GC stats collection error.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-03 08:49:32 -06:00

435 lines
7.6 KiB
Markdown

# Production Deployment Guide
This guide covers deploying Talk2Me in a production environment using a proper WSGI server.
## Overview
The Flask development server is not suitable for production use. This guide covers:
- Gunicorn as the WSGI server
- Nginx as a reverse proxy
- Docker for containerization
- Systemd for process management
- Security best practices
## Quick Start with Docker
### 1. Using Docker Compose
```bash
# Clone the repository
git clone https://github.com/your-repo/talk2me.git
cd talk2me
# Create .env file with production settings
cat > .env <<EOF
TTS_API_KEY=your-api-key
ADMIN_TOKEN=your-secure-admin-token
SECRET_KEY=your-secure-secret-key
POSTGRES_PASSWORD=your-secure-db-password
EOF
# Build and start services
docker-compose up -d
# Check status
docker-compose ps
docker-compose logs -f talk2me
```
### 2. Using Docker (standalone)
```bash
# Build the image
docker build -t talk2me .
# Run the container
docker run -d \
--name talk2me \
-p 5005:5005 \
-e TTS_API_KEY=your-api-key \
-e ADMIN_TOKEN=your-secure-token \
-e SECRET_KEY=your-secure-key \
-v $(pwd)/logs:/app/logs \
talk2me
```
## Manual Deployment
### 1. System Requirements
- Ubuntu 20.04+ or similar Linux distribution
- Python 3.8+
- Nginx
- Systemd
- 4GB+ RAM recommended
- GPU (optional, for faster transcription)
### 2. Installation
Run the deployment script as root:
```bash
sudo ./deploy.sh
```
Or manually:
```bash
# Install system dependencies
sudo apt-get update
sudo apt-get install -y python3-pip python3-venv nginx
# Create application user
sudo useradd -m -s /bin/bash talk2me
# Create directories
sudo mkdir -p /opt/talk2me /var/log/talk2me
sudo chown talk2me:talk2me /opt/talk2me /var/log/talk2me
# Copy application files
sudo cp -r . /opt/talk2me/
sudo chown -R talk2me:talk2me /opt/talk2me
# Install Python dependencies
sudo -u talk2me python3 -m venv /opt/talk2me/venv
sudo -u talk2me /opt/talk2me/venv/bin/pip install -r requirements-prod.txt
# Configure and start services
sudo cp talk2me.service /etc/systemd/system/
sudo systemctl enable talk2me
sudo systemctl start talk2me
```
## Gunicorn Configuration
The `gunicorn_config.py` file contains production-ready settings:
### Worker Configuration
```python
# Number of worker processes
workers = multiprocessing.cpu_count() * 2 + 1
# Worker timeout (increased for audio processing)
timeout = 120
# Restart workers periodically to prevent memory leaks
max_requests = 1000
max_requests_jitter = 50
```
### Performance Tuning
For different workloads:
```bash
# CPU-bound (transcription heavy)
export GUNICORN_WORKERS=8
export GUNICORN_THREADS=1
# I/O-bound (many concurrent requests)
export GUNICORN_WORKERS=4
export GUNICORN_THREADS=4
export GUNICORN_WORKER_CLASS=gthread
# Async (best concurrency)
export GUNICORN_WORKER_CLASS=gevent
export GUNICORN_WORKER_CONNECTIONS=1000
```
## Nginx Configuration
### Basic Setup
The provided `nginx.conf` includes:
- Reverse proxy to Gunicorn
- Static file serving
- WebSocket support
- Security headers
- Gzip compression
### SSL/TLS Setup
```nginx
server {
listen 443 ssl http2;
server_name your-domain.com;
ssl_certificate /etc/letsencrypt/live/your-domain.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/your-domain.com/privkey.pem;
# Strong SSL configuration
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;
ssl_prefer_server_ciphers off;
# HSTS
add_header Strict-Transport-Security "max-age=63072000" always;
}
```
## Environment Variables
### Required
```bash
# Security
SECRET_KEY=your-very-secure-secret-key
ADMIN_TOKEN=your-admin-api-token
# TTS Configuration
TTS_API_KEY=your-tts-api-key
TTS_SERVER_URL=http://your-tts-server:5050/v1/audio/speech
# Flask
FLASK_ENV=production
```
### Optional
```bash
# Performance
GUNICORN_WORKERS=4
GUNICORN_THREADS=2
MEMORY_THRESHOLD_MB=4096
GPU_MEMORY_THRESHOLD_MB=2048
# Database (for session storage)
DATABASE_URL=postgresql://user:pass@localhost/talk2me
REDIS_URL=redis://localhost:6379/0
# Monitoring
SENTRY_DSN=your-sentry-dsn
```
## Monitoring
### Health Checks
```bash
# Basic health check
curl http://localhost:5005/health
# Detailed health check
curl http://localhost:5005/health/detailed
# Memory usage
curl -H "X-Admin-Token: your-token" http://localhost:5005/admin/memory
```
### Logs
```bash
# Application logs
tail -f /var/log/talk2me/talk2me.log
# Error logs
tail -f /var/log/talk2me/errors.log
# Gunicorn logs
journalctl -u talk2me -f
# Nginx logs
tail -f /var/log/nginx/access.log
tail -f /var/log/nginx/error.log
```
### Metrics
With Prometheus client installed:
```bash
# Prometheus metrics endpoint
curl http://localhost:5005/metrics
```
## Scaling
### Horizontal Scaling
For multiple servers:
1. Use Redis for session storage
2. Use PostgreSQL for persistent data
3. Load balance with Nginx:
```nginx
upstream talk2me_backends {
least_conn;
server server1:5005 weight=1;
server server2:5005 weight=1;
server server3:5005 weight=1;
}
```
### Vertical Scaling
Adjust based on load:
```bash
# High memory usage
MEMORY_THRESHOLD_MB=8192
GPU_MEMORY_THRESHOLD_MB=4096
# More workers
GUNICORN_WORKERS=16
GUNICORN_THREADS=4
# Larger file limits
client_max_body_size 100M;
```
## Security
### Firewall
```bash
# Allow only necessary ports
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw allow 22/tcp
sudo ufw enable
```
### File Permissions
```bash
# Secure file permissions
sudo chmod 750 /opt/talk2me
sudo chmod 640 /opt/talk2me/.env
sudo chmod 755 /opt/talk2me/static
```
### AppArmor/SELinux
Create security profiles to restrict application access.
## Backup
### Database Backup
```bash
# PostgreSQL
pg_dump talk2me > backup.sql
# Redis
redis-cli BGSAVE
```
### Application Backup
```bash
# Backup application and logs
tar -czf talk2me-backup.tar.gz \
/opt/talk2me \
/var/log/talk2me \
/etc/systemd/system/talk2me.service \
/etc/nginx/sites-available/talk2me
```
## Troubleshooting
### Service Won't Start
```bash
# Check service status
systemctl status talk2me
# Check logs
journalctl -u talk2me -n 100
# Test configuration
sudo -u talk2me /opt/talk2me/venv/bin/gunicorn --check-config wsgi:application
```
### High Memory Usage
```bash
# Trigger cleanup
curl -X POST -H "X-Admin-Token: token" http://localhost:5005/admin/memory/cleanup
# Restart workers
systemctl reload talk2me
```
### Slow Response Times
1. Check worker count
2. Enable async workers
3. Check GPU availability
4. Review nginx buffering settings
## Performance Optimization
### 1. Enable GPU
Ensure CUDA/ROCm is properly installed:
```bash
# Check GPU
nvidia-smi # or rocm-smi
# Set in environment
export CUDA_VISIBLE_DEVICES=0
```
### 2. Optimize Workers
```python
# For CPU-heavy workloads
workers = cpu_count()
threads = 1
# For I/O-heavy workloads
workers = cpu_count() * 2
threads = 4
```
### 3. Enable Caching
Use Redis for caching translations:
```python
CACHE_TYPE = 'redis'
CACHE_REDIS_URL = 'redis://localhost:6379/0'
```
## Maintenance
### Regular Tasks
1. **Log Rotation**: Configured automatically
2. **Database Cleanup**: Run weekly
3. **Model Updates**: Check for Whisper updates
4. **Security Updates**: Keep dependencies updated
### Update Procedure
```bash
# Backup first
./backup.sh
# Update code
git pull
# Update dependencies
sudo -u talk2me /opt/talk2me/venv/bin/pip install -r requirements-prod.txt
# Restart service
sudo systemctl restart talk2me
```
## Rollback
If deployment fails:
```bash
# Stop service
sudo systemctl stop talk2me
# Restore backup
tar -xzf talk2me-backup.tar.gz -C /
# Restart service
sudo systemctl start talk2me
```