talk2me/PRODUCTION_DEPLOYMENT.md

# Production Deployment Guide

This guide covers deploying Talk2Me in a production environment using a proper WSGI server.

## Overview

The Flask development server is not suitable for production use. This guide covers:
- Gunicorn as the WSGI server
- Nginx as a reverse proxy
- Docker for containerization
- Systemd for process management
- Security best practices

## Quick Start with Docker

### 1. Using Docker Compose

```bash
# Clone the repository
git clone https://github.com/your-repo/talk2me.git
cd talk2me

# Create .env file with production settings
cat > .env <<EOF
TTS_API_KEY=your-api-key
ADMIN_TOKEN=your-secure-admin-token
SECRET_KEY=your-secure-secret-key
POSTGRES_PASSWORD=your-secure-db-password
EOF

# Build and start services
docker-compose up -d

# Check status
docker-compose ps
docker-compose logs -f talk2me
```

### 2. Using Docker (standalone)

```bash
# Build the image
docker build -t talk2me .

# Run the container
docker run -d \
  --name talk2me \
  -p 5005:5005 \
  -e TTS_API_KEY=your-api-key \
  -e ADMIN_TOKEN=your-secure-token \
  -e SECRET_KEY=your-secure-key \
  -v $(pwd)/logs:/app/logs \
  talk2me
```

## Manual Deployment

### 1. System Requirements

- Ubuntu 20.04+ or similar Linux distribution
- Python 3.8+
- Nginx
- Systemd
- 4GB+ RAM recommended
- GPU (optional, for faster transcription)

### 2. Installation

Run the deployment script as root:

```bash
sudo ./deploy.sh
```

Or manually:

```bash
# Install system dependencies
sudo apt-get update
sudo apt-get install -y python3-pip python3-venv nginx

# Create application user
sudo useradd -m -s /bin/bash talk2me

# Create directories
sudo mkdir -p /opt/talk2me /var/log/talk2me
sudo chown talk2me:talk2me /opt/talk2me /var/log/talk2me

# Copy application files
sudo cp -r . /opt/talk2me/
sudo chown -R talk2me:talk2me /opt/talk2me

# Install Python dependencies
sudo -u talk2me python3 -m venv /opt/talk2me/venv
sudo -u talk2me /opt/talk2me/venv/bin/pip install -r requirements-prod.txt

# Configure and start services
sudo cp talk2me.service /etc/systemd/system/
sudo systemctl enable talk2me
sudo systemctl start talk2me
```

## Gunicorn Configuration

The `gunicorn_config.py` file contains production-ready settings:

### Worker Configuration

```python
# Number of worker processes
workers = multiprocessing.cpu_count() * 2 + 1

# Worker timeout (increased for audio processing)
timeout = 120

# Restart workers periodically to prevent memory leaks
max_requests = 1000
max_requests_jitter = 50
```

### Performance Tuning

For different workloads:

```bash
# CPU-bound (transcription heavy)
export GUNICORN_WORKERS=8
export GUNICORN_THREADS=1

# I/O-bound (many concurrent requests)
export GUNICORN_WORKERS=4
export GUNICORN_THREADS=4
export GUNICORN_WORKER_CLASS=gthread

# Async (best concurrency)
export GUNICORN_WORKER_CLASS=gevent
export GUNICORN_WORKER_CONNECTIONS=1000
```

## Nginx Configuration

### Basic Setup

The provided `nginx.conf` includes:
- Reverse proxy to Gunicorn
- Static file serving
- WebSocket support
- Security headers
- Gzip compression

### SSL/TLS Setup

```nginx
server {
    listen 443 ssl http2;
    server_name your-domain.com;

    ssl_certificate /etc/letsencrypt/live/your-domain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/your-domain.com/privkey.pem;

    # Strong SSL configuration
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;
    ssl_prefer_server_ciphers off;

    # HSTS
    add_header Strict-Transport-Security "max-age=63072000" always;
}
```

## Environment Variables

### Required

```bash
# Security
SECRET_KEY=your-very-secure-secret-key
ADMIN_TOKEN=your-admin-api-token

# TTS Configuration
TTS_API_KEY=your-tts-api-key
TTS_SERVER_URL=http://your-tts-server:5050/v1/audio/speech

# Flask
FLASK_ENV=production
```

### Optional

```bash
# Performance
GUNICORN_WORKERS=4
GUNICORN_THREADS=2
MEMORY_THRESHOLD_MB=4096
GPU_MEMORY_THRESHOLD_MB=2048

# Database (for session storage)
DATABASE_URL=postgresql://user:pass@localhost/talk2me
REDIS_URL=redis://localhost:6379/0

# Monitoring
SENTRY_DSN=your-sentry-dsn
```

## Monitoring

### Health Checks

```bash
# Basic health check
curl http://localhost:5005/health

# Detailed health check
curl http://localhost:5005/health/detailed

# Memory usage
curl -H "X-Admin-Token: your-token" http://localhost:5005/admin/memory
```

### Logs

```bash
# Application logs
tail -f /var/log/talk2me/talk2me.log

# Error logs
tail -f /var/log/talk2me/errors.log

# Gunicorn logs
journalctl -u talk2me -f

# Nginx logs
tail -f /var/log/nginx/access.log
tail -f /var/log/nginx/error.log
```

### Metrics

With Prometheus client installed:

```bash
# Prometheus metrics endpoint
curl http://localhost:5005/metrics
```

## Scaling

### Horizontal Scaling

For multiple servers:

1. Use Redis for session storage
2. Use PostgreSQL for persistent data
3. Load balance with Nginx:

```nginx
upstream talk2me_backends {
    least_conn;
    server server1:5005 weight=1;
    server server2:5005 weight=1;
    server server3:5005 weight=1;
}
```

### Vertical Scaling

Adjust based on load:

```bash
# High memory usage
MEMORY_THRESHOLD_MB=8192
GPU_MEMORY_THRESHOLD_MB=4096

# More workers
GUNICORN_WORKERS=16
GUNICORN_THREADS=4

# Larger file limits
client_max_body_size 100M;
```

## Security

### Firewall

```bash
# Allow only necessary ports
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw allow 22/tcp
sudo ufw enable
```

### File Permissions

```bash
# Secure file permissions
sudo chmod 750 /opt/talk2me
sudo chmod 640 /opt/talk2me/.env
sudo chmod 755 /opt/talk2me/static
```

### AppArmor/SELinux

Create security profiles to restrict application access.

## Backup

### Database Backup

```bash
# PostgreSQL
pg_dump talk2me > backup.sql

# Redis
redis-cli BGSAVE
```

### Application Backup

```bash
# Backup application and logs
tar -czf talk2me-backup.tar.gz \
  /opt/talk2me \
  /var/log/talk2me \
  /etc/systemd/system/talk2me.service \
  /etc/nginx/sites-available/talk2me
```

## Troubleshooting

### Service Won't Start

```bash
# Check service status
systemctl status talk2me

# Check logs
journalctl -u talk2me -n 100

# Test configuration
sudo -u talk2me /opt/talk2me/venv/bin/gunicorn --check-config wsgi:application
```

### High Memory Usage

```bash
# Trigger cleanup
curl -X POST -H "X-Admin-Token: token" http://localhost:5005/admin/memory/cleanup

# Restart workers
systemctl reload talk2me
```

### Slow Response Times

1. Check worker count
2. Enable async workers
3. Check GPU availability
4. Review nginx buffering settings

## Performance Optimization

### 1. Enable GPU

Ensure CUDA/ROCm is properly installed:

```bash
# Check GPU
nvidia-smi  # or rocm-smi

# Set in environment
export CUDA_VISIBLE_DEVICES=0
```

### 2. Optimize Workers

```python
# For CPU-heavy workloads
workers = cpu_count()
threads = 1

# For I/O-heavy workloads
workers = cpu_count() * 2
threads = 4
```

### 3. Enable Caching

Use Redis for caching translations:

```python
CACHE_TYPE = 'redis'
CACHE_REDIS_URL = 'redis://localhost:6379/0'
```

## Maintenance

### Regular Tasks

1. **Log Rotation**: Configured automatically
2. **Database Cleanup**: Run weekly
3. **Model Updates**: Check for Whisper updates
4. **Security Updates**: Keep dependencies updated

### Update Procedure

```bash
# Backup first
./backup.sh

# Update code
git pull

# Update dependencies
sudo -u talk2me /opt/talk2me/venv/bin/pip install -r requirements-prod.txt

# Restart service
sudo systemctl restart talk2me
```

## Rollback

If deployment fails:

```bash
# Stop service
sudo systemctl stop talk2me

# Restore backup
tar -xzf talk2me-backup.tar.gz -C /

# Restart service
sudo systemctl start talk2me
```