This adds a complete production deployment setup using Gunicorn as the WSGI server, replacing Flask's development server. Key components: - Gunicorn configuration with optimized worker settings - Support for sync, threaded, and async (gevent) workers - Automatic worker recycling to prevent memory leaks - Increased timeouts for audio processing - Production-ready logging and monitoring Deployment options: 1. Docker/Docker Compose for containerized deployment 2. Systemd service for traditional deployment 3. Nginx reverse proxy configuration 4. SSL/TLS support Production features: - wsgi.py entry point for WSGI servers - gunicorn_config.py with production settings - Dockerfile with multi-stage build - docker-compose.yml with full stack (Redis, PostgreSQL) - nginx.conf with caching and security headers - systemd service with security hardening - deploy.sh automated deployment script Configuration: - .env.production template with all settings - Support for environment-based configuration - Separate requirements-prod.txt - Prometheus metrics endpoint (/metrics) Monitoring: - Health check endpoints for liveness/readiness - Prometheus-compatible metrics - Structured logging - Memory usage tracking - Request counting Security: - Non-root user in Docker - Systemd security restrictions - Nginx security headers - File permission hardening - Resource limits Documentation: - Comprehensive PRODUCTION_DEPLOYMENT.md - Scaling strategies - Performance tuning guide - Troubleshooting section Also fixed memory_manager.py GC stats collection error. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
435 lines
7.6 KiB
Markdown
435 lines
7.6 KiB
Markdown
# Production Deployment Guide
|
|
|
|
This guide covers deploying Talk2Me in a production environment using a proper WSGI server.
|
|
|
|
## Overview
|
|
|
|
The Flask development server is not suitable for production use. This guide covers:
|
|
- Gunicorn as the WSGI server
|
|
- Nginx as a reverse proxy
|
|
- Docker for containerization
|
|
- Systemd for process management
|
|
- Security best practices
|
|
|
|
## Quick Start with Docker
|
|
|
|
### 1. Using Docker Compose
|
|
|
|
```bash
|
|
# Clone the repository
|
|
git clone https://github.com/your-repo/talk2me.git
|
|
cd talk2me
|
|
|
|
# Create .env file with production settings
|
|
cat > .env <<EOF
|
|
TTS_API_KEY=your-api-key
|
|
ADMIN_TOKEN=your-secure-admin-token
|
|
SECRET_KEY=your-secure-secret-key
|
|
POSTGRES_PASSWORD=your-secure-db-password
|
|
EOF
|
|
|
|
# Build and start services
|
|
docker-compose up -d
|
|
|
|
# Check status
|
|
docker-compose ps
|
|
docker-compose logs -f talk2me
|
|
```
|
|
|
|
### 2. Using Docker (standalone)
|
|
|
|
```bash
|
|
# Build the image
|
|
docker build -t talk2me .
|
|
|
|
# Run the container
|
|
docker run -d \
|
|
--name talk2me \
|
|
-p 5005:5005 \
|
|
-e TTS_API_KEY=your-api-key \
|
|
-e ADMIN_TOKEN=your-secure-token \
|
|
-e SECRET_KEY=your-secure-key \
|
|
-v $(pwd)/logs:/app/logs \
|
|
talk2me
|
|
```
|
|
|
|
## Manual Deployment
|
|
|
|
### 1. System Requirements
|
|
|
|
- Ubuntu 20.04+ or similar Linux distribution
|
|
- Python 3.8+
|
|
- Nginx
|
|
- Systemd
|
|
- 4GB+ RAM recommended
|
|
- GPU (optional, for faster transcription)
|
|
|
|
### 2. Installation
|
|
|
|
Run the deployment script as root:
|
|
|
|
```bash
|
|
sudo ./deploy.sh
|
|
```
|
|
|
|
Or manually:
|
|
|
|
```bash
|
|
# Install system dependencies
|
|
sudo apt-get update
|
|
sudo apt-get install -y python3-pip python3-venv nginx
|
|
|
|
# Create application user
|
|
sudo useradd -m -s /bin/bash talk2me
|
|
|
|
# Create directories
|
|
sudo mkdir -p /opt/talk2me /var/log/talk2me
|
|
sudo chown talk2me:talk2me /opt/talk2me /var/log/talk2me
|
|
|
|
# Copy application files
|
|
sudo cp -r . /opt/talk2me/
|
|
sudo chown -R talk2me:talk2me /opt/talk2me
|
|
|
|
# Install Python dependencies
|
|
sudo -u talk2me python3 -m venv /opt/talk2me/venv
|
|
sudo -u talk2me /opt/talk2me/venv/bin/pip install -r requirements-prod.txt
|
|
|
|
# Configure and start services
|
|
sudo cp talk2me.service /etc/systemd/system/
|
|
sudo systemctl enable talk2me
|
|
sudo systemctl start talk2me
|
|
```
|
|
|
|
## Gunicorn Configuration
|
|
|
|
The `gunicorn_config.py` file contains production-ready settings:
|
|
|
|
### Worker Configuration
|
|
|
|
```python
|
|
# Number of worker processes
|
|
workers = multiprocessing.cpu_count() * 2 + 1
|
|
|
|
# Worker timeout (increased for audio processing)
|
|
timeout = 120
|
|
|
|
# Restart workers periodically to prevent memory leaks
|
|
max_requests = 1000
|
|
max_requests_jitter = 50
|
|
```
|
|
|
|
### Performance Tuning
|
|
|
|
For different workloads:
|
|
|
|
```bash
|
|
# CPU-bound (transcription heavy)
|
|
export GUNICORN_WORKERS=8
|
|
export GUNICORN_THREADS=1
|
|
|
|
# I/O-bound (many concurrent requests)
|
|
export GUNICORN_WORKERS=4
|
|
export GUNICORN_THREADS=4
|
|
export GUNICORN_WORKER_CLASS=gthread
|
|
|
|
# Async (best concurrency)
|
|
export GUNICORN_WORKER_CLASS=gevent
|
|
export GUNICORN_WORKER_CONNECTIONS=1000
|
|
```
|
|
|
|
## Nginx Configuration
|
|
|
|
### Basic Setup
|
|
|
|
The provided `nginx.conf` includes:
|
|
- Reverse proxy to Gunicorn
|
|
- Static file serving
|
|
- WebSocket support
|
|
- Security headers
|
|
- Gzip compression
|
|
|
|
### SSL/TLS Setup
|
|
|
|
```nginx
|
|
server {
|
|
listen 443 ssl http2;
|
|
server_name your-domain.com;
|
|
|
|
ssl_certificate /etc/letsencrypt/live/your-domain.com/fullchain.pem;
|
|
ssl_certificate_key /etc/letsencrypt/live/your-domain.com/privkey.pem;
|
|
|
|
# Strong SSL configuration
|
|
ssl_protocols TLSv1.2 TLSv1.3;
|
|
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;
|
|
ssl_prefer_server_ciphers off;
|
|
|
|
# HSTS
|
|
add_header Strict-Transport-Security "max-age=63072000" always;
|
|
}
|
|
```
|
|
|
|
## Environment Variables
|
|
|
|
### Required
|
|
|
|
```bash
|
|
# Security
|
|
SECRET_KEY=your-very-secure-secret-key
|
|
ADMIN_TOKEN=your-admin-api-token
|
|
|
|
# TTS Configuration
|
|
TTS_API_KEY=your-tts-api-key
|
|
TTS_SERVER_URL=http://your-tts-server:5050/v1/audio/speech
|
|
|
|
# Flask
|
|
FLASK_ENV=production
|
|
```
|
|
|
|
### Optional
|
|
|
|
```bash
|
|
# Performance
|
|
GUNICORN_WORKERS=4
|
|
GUNICORN_THREADS=2
|
|
MEMORY_THRESHOLD_MB=4096
|
|
GPU_MEMORY_THRESHOLD_MB=2048
|
|
|
|
# Database (for session storage)
|
|
DATABASE_URL=postgresql://user:pass@localhost/talk2me
|
|
REDIS_URL=redis://localhost:6379/0
|
|
|
|
# Monitoring
|
|
SENTRY_DSN=your-sentry-dsn
|
|
```
|
|
|
|
## Monitoring
|
|
|
|
### Health Checks
|
|
|
|
```bash
|
|
# Basic health check
|
|
curl http://localhost:5005/health
|
|
|
|
# Detailed health check
|
|
curl http://localhost:5005/health/detailed
|
|
|
|
# Memory usage
|
|
curl -H "X-Admin-Token: your-token" http://localhost:5005/admin/memory
|
|
```
|
|
|
|
### Logs
|
|
|
|
```bash
|
|
# Application logs
|
|
tail -f /var/log/talk2me/talk2me.log
|
|
|
|
# Error logs
|
|
tail -f /var/log/talk2me/errors.log
|
|
|
|
# Gunicorn logs
|
|
journalctl -u talk2me -f
|
|
|
|
# Nginx logs
|
|
tail -f /var/log/nginx/access.log
|
|
tail -f /var/log/nginx/error.log
|
|
```
|
|
|
|
### Metrics
|
|
|
|
With Prometheus client installed:
|
|
|
|
```bash
|
|
# Prometheus metrics endpoint
|
|
curl http://localhost:5005/metrics
|
|
```
|
|
|
|
## Scaling
|
|
|
|
### Horizontal Scaling
|
|
|
|
For multiple servers:
|
|
|
|
1. Use Redis for session storage
|
|
2. Use PostgreSQL for persistent data
|
|
3. Load balance with Nginx:
|
|
|
|
```nginx
|
|
upstream talk2me_backends {
|
|
least_conn;
|
|
server server1:5005 weight=1;
|
|
server server2:5005 weight=1;
|
|
server server3:5005 weight=1;
|
|
}
|
|
```
|
|
|
|
### Vertical Scaling
|
|
|
|
Adjust based on load:
|
|
|
|
```bash
|
|
# High memory usage
|
|
MEMORY_THRESHOLD_MB=8192
|
|
GPU_MEMORY_THRESHOLD_MB=4096
|
|
|
|
# More workers
|
|
GUNICORN_WORKERS=16
|
|
GUNICORN_THREADS=4
|
|
|
|
# Larger file limits
|
|
client_max_body_size 100M;
|
|
```
|
|
|
|
## Security
|
|
|
|
### Firewall
|
|
|
|
```bash
|
|
# Allow only necessary ports
|
|
sudo ufw allow 80/tcp
|
|
sudo ufw allow 443/tcp
|
|
sudo ufw allow 22/tcp
|
|
sudo ufw enable
|
|
```
|
|
|
|
### File Permissions
|
|
|
|
```bash
|
|
# Secure file permissions
|
|
sudo chmod 750 /opt/talk2me
|
|
sudo chmod 640 /opt/talk2me/.env
|
|
sudo chmod 755 /opt/talk2me/static
|
|
```
|
|
|
|
### AppArmor/SELinux
|
|
|
|
Create security profiles to restrict application access.
|
|
|
|
## Backup
|
|
|
|
### Database Backup
|
|
|
|
```bash
|
|
# PostgreSQL
|
|
pg_dump talk2me > backup.sql
|
|
|
|
# Redis
|
|
redis-cli BGSAVE
|
|
```
|
|
|
|
### Application Backup
|
|
|
|
```bash
|
|
# Backup application and logs
|
|
tar -czf talk2me-backup.tar.gz \
|
|
/opt/talk2me \
|
|
/var/log/talk2me \
|
|
/etc/systemd/system/talk2me.service \
|
|
/etc/nginx/sites-available/talk2me
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Service Won't Start
|
|
|
|
```bash
|
|
# Check service status
|
|
systemctl status talk2me
|
|
|
|
# Check logs
|
|
journalctl -u talk2me -n 100
|
|
|
|
# Test configuration
|
|
sudo -u talk2me /opt/talk2me/venv/bin/gunicorn --check-config wsgi:application
|
|
```
|
|
|
|
### High Memory Usage
|
|
|
|
```bash
|
|
# Trigger cleanup
|
|
curl -X POST -H "X-Admin-Token: token" http://localhost:5005/admin/memory/cleanup
|
|
|
|
# Restart workers
|
|
systemctl reload talk2me
|
|
```
|
|
|
|
### Slow Response Times
|
|
|
|
1. Check worker count
|
|
2. Enable async workers
|
|
3. Check GPU availability
|
|
4. Review nginx buffering settings
|
|
|
|
## Performance Optimization
|
|
|
|
### 1. Enable GPU
|
|
|
|
Ensure CUDA/ROCm is properly installed:
|
|
|
|
```bash
|
|
# Check GPU
|
|
nvidia-smi # or rocm-smi
|
|
|
|
# Set in environment
|
|
export CUDA_VISIBLE_DEVICES=0
|
|
```
|
|
|
|
### 2. Optimize Workers
|
|
|
|
```python
|
|
# For CPU-heavy workloads
|
|
workers = cpu_count()
|
|
threads = 1
|
|
|
|
# For I/O-heavy workloads
|
|
workers = cpu_count() * 2
|
|
threads = 4
|
|
```
|
|
|
|
### 3. Enable Caching
|
|
|
|
Use Redis for caching translations:
|
|
|
|
```python
|
|
CACHE_TYPE = 'redis'
|
|
CACHE_REDIS_URL = 'redis://localhost:6379/0'
|
|
```
|
|
|
|
## Maintenance
|
|
|
|
### Regular Tasks
|
|
|
|
1. **Log Rotation**: Configured automatically
|
|
2. **Database Cleanup**: Run weekly
|
|
3. **Model Updates**: Check for Whisper updates
|
|
4. **Security Updates**: Keep dependencies updated
|
|
|
|
### Update Procedure
|
|
|
|
```bash
|
|
# Backup first
|
|
./backup.sh
|
|
|
|
# Update code
|
|
git pull
|
|
|
|
# Update dependencies
|
|
sudo -u talk2me /opt/talk2me/venv/bin/pip install -r requirements-prod.txt
|
|
|
|
# Restart service
|
|
sudo systemctl restart talk2me
|
|
```
|
|
|
|
## Rollback
|
|
|
|
If deployment fails:
|
|
|
|
```bash
|
|
# Stop service
|
|
sudo systemctl stop talk2me
|
|
|
|
# Restore backup
|
|
tar -xzf talk2me-backup.tar.gz -C /
|
|
|
|
# Restart service
|
|
sudo systemctl start talk2me
|
|
``` |