talk2me/README.md

# Talk2Me - Real-Time Voice Language Translator

A production-ready, mobile-friendly web application that provides real-time translation of spoken language between multiple languages.

## Features

- **Real-time Speech Recognition**: Powered by OpenAI Whisper with GPU acceleration
- **Advanced Translation**: Using Gemma 3 open-source LLM via Ollama
- **Natural Text-to-Speech**: OpenAI Edge TTS for lifelike voice output
- **Progressive Web App**: Full offline support with service workers
- **Multi-Speaker Support**: Track and translate conversations with multiple participants
- **Enterprise Security**: Comprehensive rate limiting, session management, and encrypted secrets
- **Production Ready**: Docker support, load balancing, and extensive monitoring
- **Admin Dashboard**: Real-time analytics, performance monitoring, and system health tracking

## Table of Contents

- [Supported Languages](#supported-languages)
- [Quick Start](#quick-start)
- [Installation](#installation)
- [Configuration](#configuration)
- [Security Features](#security-features)
- [Production Deployment](#production-deployment)
- [API Documentation](#api-documentation)
- [Development](#development)
- [Monitoring & Operations](#monitoring--operations)
- [Troubleshooting](#troubleshooting)
- [Contributing](#contributing)

## Supported Languages

- Arabic
- Armenian
- Azerbaijani
- English
- French
- Georgian
- Kazakh
- Mandarin
- Farsi
- Portuguese
- Russian
- Spanish
- Turkish
- Uzbek

## Quick Start

```bash
# Clone the repository
git clone https://github.com/yourusername/talk2me.git
cd talk2me

# Install dependencies
pip install -r requirements.txt
npm install

# Initialize secure configuration
python manage_secrets.py init
python manage_secrets.py set TTS_API_KEY your-api-key-here

# Ensure Ollama is running with Gemma
ollama pull gemma2:9b
ollama pull gemma3:27b

# Start the application
python app.py
```

Open your browser and navigate to `http://localhost:5005`

## Installation

### Prerequisites

- Python 3.8+
- Node.js 14+
- Ollama (for LLM translation)
- OpenAI Edge TTS server
- Optional: NVIDIA GPU with CUDA, AMD GPU with ROCm, or Apple Silicon

### Detailed Setup

1. **Install Python dependencies**:
   ```bash
   python -m venv venv
   source venv/bin/activate  # On Windows: venv\Scripts\activate
   pip install -r requirements.txt
   ```

2. **Install Node.js dependencies**:
   ```bash
   npm install
   npm run build  # Build TypeScript files
   ```

3. **Configure GPU Support** (Optional):
   ```bash
   # For NVIDIA GPUs
   pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

   # For AMD GPUs (ROCm)
   pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2

   # For Apple Silicon
   pip install torch torchvision torchaudio
   ```

4. **Set up Ollama**:
   ```bash
   # Install Ollama (https://ollama.ai)
   curl -fsSL https://ollama.ai/install.sh | sh

   # Pull required models
   ollama pull gemma2:9b    # Faster, for streaming
   ollama pull gemma3:27b   # Better quality
   ```

5. **Configure TTS Server**:
   Ensure your OpenAI Edge TTS server is running. Default expected at `http://localhost:5050`

## Configuration

### Environment Variables

Talk2Me uses encrypted secrets management for sensitive configuration. You can use either the secure secrets system or traditional environment variables.

#### Using Secure Secrets Management (Recommended)

```bash
# Initialize the secrets system
python manage_secrets.py init

# Set required secrets
python manage_secrets.py set TTS_API_KEY
python manage_secrets.py set TTS_SERVER_URL
python manage_secrets.py set ADMIN_TOKEN

# List all secrets
python manage_secrets.py list

# Rotate encryption keys
python manage_secrets.py rotate
```

#### Using Environment Variables

Create a `.env` file:

```env
# Core Configuration
TTS_API_KEY=your-api-key-here
TTS_SERVER_URL=http://localhost:5050/v1/audio/speech
ADMIN_TOKEN=your-secure-admin-token

# CORS Configuration
CORS_ORIGINS=https://yourdomain.com,https://app.yourdomain.com
ADMIN_CORS_ORIGINS=https://admin.yourdomain.com

# Security Settings
SECRET_KEY=your-secret-key-here
MAX_CONTENT_LENGTH=52428800  # 50MB
SESSION_LIFETIME=3600  # 1 hour
RATE_LIMIT_STORAGE_URL=redis://localhost:6379/0

# Performance Tuning
WHISPER_MODEL_SIZE=base
GPU_MEMORY_THRESHOLD_MB=2048
MEMORY_CLEANUP_INTERVAL=30
```

### Advanced Configuration

#### CORS Settings

```bash
# Development (allow all origins)
export CORS_ORIGINS="*"

# Production (restrict to specific domains)
export CORS_ORIGINS="https://yourdomain.com,https://app.yourdomain.com"
export ADMIN_CORS_ORIGINS="https://admin.yourdomain.com"
```

#### Rate Limiting

Configure per-endpoint rate limits:

```python
# In your config or via admin API
RATE_LIMITS = {
    'default': {'requests_per_minute': 30, 'requests_per_hour': 500},
    'transcribe': {'requests_per_minute': 10, 'requests_per_hour': 100},
    'translate': {'requests_per_minute': 20, 'requests_per_hour': 300}
}
```

#### Session Management

```python
SESSION_CONFIG = {
    'max_file_size_mb': 100,
    'max_files_per_session': 100,
    'idle_timeout_minutes': 15,
    'max_lifetime_minutes': 60
}
```

## Security Features

### 1. Rate Limiting

Comprehensive DoS protection with:
- Token bucket algorithm with sliding window
- Per-endpoint configurable limits
- Automatic IP blocking for abusive clients
- Request size validation

```bash
# Check rate limit status
curl -H "X-Admin-Token: $ADMIN_TOKEN" http://localhost:5005/admin/rate-limits

# Block an IP
curl -X POST -H "X-Admin-Token: $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"ip": "192.168.1.100", "duration": 3600}' \
  http://localhost:5005/admin/block-ip
```

### 2. Secrets Management

- AES-128 encryption for sensitive data
- Automatic key rotation
- Audit logging
- Platform-specific secure storage

```bash
# View audit log
python manage_secrets.py audit

# Backup secrets
python manage_secrets.py export --output backup.enc

# Restore from backup
python manage_secrets.py import --input backup.enc
```

### 3. Session Management

- Automatic resource tracking
- Per-session limits (100 files, 100MB)
- Idle session cleanup (15 minutes)
- Real-time monitoring

```bash
# View active sessions
curl -H "X-Admin-Token: $ADMIN_TOKEN" http://localhost:5005/admin/sessions

# Clean up specific session
curl -X POST -H "X-Admin-Token: $ADMIN_TOKEN" \
  http://localhost:5005/admin/sessions/SESSION_ID/cleanup
```

### 4. Request Size Limits

- Global limit: 50MB
- Audio files: 25MB
- JSON payloads: 1MB
- Dynamic configuration

```bash
# Update size limits
curl -X POST -H "X-Admin-Token: $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"max_audio_size": "30MB"}' \
  http://localhost:5005/admin/size-limits
```

## Production Deployment

### Docker Deployment

```bash
# Build and run with Docker Compose (CPU only)
docker-compose up -d

# With NVIDIA GPU support
docker-compose -f docker-compose.yml -f docker-compose.nvidia.yml up -d

# With AMD GPU support (ROCm)
docker-compose -f docker-compose.yml -f docker-compose.amd.yml up -d

# With Apple Silicon support
docker-compose -f docker-compose.yml -f docker-compose.apple.yml up -d

# Scale web workers
docker-compose up -d --scale talk2me=4

# View logs
docker-compose logs -f talk2me
```

### Docker Compose Configuration

Choose the appropriate configuration based on your GPU:

#### NVIDIA GPU Configuration

```yaml
version: '3.8'
services:
  web:
    build: .
    ports:
      - "5005:5005"
    environment:
      - GUNICORN_WORKERS=4
      - GUNICORN_THREADS=2
    volumes:
      - ./logs:/app/logs
      - whisper-cache:/root/.cache/whisper
    deploy:
      resources:
        limits:
          memory: 4G
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
```

#### AMD GPU Configuration (ROCm)

```yaml
version: '3.8'
services:
  web:
    build: .
    ports:
      - "5005:5005"
    environment:
      - GUNICORN_WORKERS=4
      - GUNICORN_THREADS=2
      - HSA_OVERRIDE_GFX_VERSION=10.3.0  # Adjust for your GPU
    volumes:
      - ./logs:/app/logs
      - whisper-cache:/root/.cache/whisper
      - /dev/kfd:/dev/kfd  # ROCm KFD interface
      - /dev/dri:/dev/dri  # Direct Rendering Interface
    devices:
      - /dev/kfd
      - /dev/dri
    group_add:
      - video
      - render
    deploy:
      resources:
        limits:
          memory: 4G
```

#### Apple Silicon Configuration

```yaml
version: '3.8'
services:
  web:
    build: .
    platform: linux/arm64/v8  # For M1/M2 Macs
    ports:
      - "5005:5005"
    environment:
      - GUNICORN_WORKERS=4
      - GUNICORN_THREADS=2
      - PYTORCH_ENABLE_MPS_FALLBACK=1  # Enable MPS fallback
    volumes:
      - ./logs:/app/logs
      - whisper-cache:/root/.cache/whisper
    deploy:
      resources:
        limits:
          memory: 4G
```

#### CPU-Only Configuration

```yaml
version: '3.8'
services:
  web:
    build: .
    ports:
      - "5005:5005"
    environment:
      - GUNICORN_WORKERS=4
      - GUNICORN_THREADS=2
      - OMP_NUM_THREADS=4  # OpenMP threads for CPU
    volumes:
      - ./logs:/app/logs
      - whisper-cache:/root/.cache/whisper
    deploy:
      resources:
        limits:
          memory: 4G
          cpus: '4.0'
```

### Nginx Configuration

```nginx
upstream talk2me {
    least_conn;
    server web1:5005 weight=1 max_fails=3 fail_timeout=30s;
    server web2:5005 weight=1 max_fails=3 fail_timeout=30s;
}

server {
    listen 443 ssl http2;
    server_name talk2me.yourdomain.com;

    ssl_certificate /etc/ssl/certs/talk2me.crt;
    ssl_certificate_key /etc/ssl/private/talk2me.key;

    client_max_body_size 50M;

    location / {
        proxy_pass http://talk2me;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $host;

        # WebSocket support
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }

    # Cache static assets
    location /static/ {
        alias /app/static/;
        expires 30d;
        add_header Cache-Control "public, immutable";
    }
}
```

### Systemd Service

```ini
[Unit]
Description=Talk2Me Translation Service
After=network.target

[Service]
Type=notify
User=talk2me
Group=talk2me
WorkingDirectory=/opt/talk2me
Environment="PATH=/opt/talk2me/venv/bin"
ExecStart=/opt/talk2me/venv/bin/gunicorn \
    --config gunicorn_config.py \
    --bind 0.0.0.0:5005 \
    app:app
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target
```

## API Documentation

### Core Endpoints

#### Transcribe Audio
```http
POST /transcribe
Content-Type: multipart/form-data

audio: (binary)
source_lang: auto|language_code
```

#### Translate Text
```http
POST /translate
Content-Type: application/json

{
  "text": "Hello world",
  "source_lang": "English",
  "target_lang": "Spanish"
}
```

#### Streaming Translation
```http
POST /translate/stream
Content-Type: application/json

{
  "text": "Long text to translate",
  "source_lang": "auto",
  "target_lang": "French"
}

Response: Server-Sent Events stream
```

#### Text-to-Speech
```http
POST /speak
Content-Type: application/json

{
  "text": "Hola mundo",
  "language": "Spanish"
}
```

### Admin Endpoints

All admin endpoints require `X-Admin-Token` header.

#### Health & Monitoring
- `GET /health` - Basic health check
- `GET /health/detailed` - Component status
- `GET /metrics` - Prometheus metrics
- `GET /admin/memory` - Memory usage stats

#### Session Management
- `GET /admin/sessions` - List active sessions
- `GET /admin/sessions/:id` - Session details
- `POST /admin/sessions/:id/cleanup` - Manual cleanup

#### Security Controls
- `GET /admin/rate-limits` - View rate limits
- `POST /admin/block-ip` - Block IP address
- `GET /admin/logs/security` - Security events

## Admin Dashboard

Talk2Me includes a comprehensive admin analytics dashboard for monitoring and managing the application.

### Features

- **Real-time Analytics**: Monitor requests, active sessions, and error rates
- **Performance Metrics**: Track response times, throughput, and resource usage
- **System Health**: Monitor Redis, PostgreSQL, and ML services status
- **Language Analytics**: View popular language pairs and usage patterns
- **Error Analysis**: Detailed error tracking with types and trends
- **Data Export**: Download analytics data in JSON format

### Setup

1. **Initialize Database**:
   ```bash
   python init_analytics_db.py
   ```

2. **Configure Admin Token**:
   ```bash
   export ADMIN_TOKEN="your-secure-admin-token"
   ```

3. **Access Dashboard**:
   - Navigate to `https://yourdomain.com/admin`
   - Enter your admin token
   - View real-time analytics

### Dashboard Sections

- **Overview Cards**: Key metrics at a glance
- **Request Volume**: Visualize traffic patterns
- **Operations**: Translation and transcription statistics
- **Performance**: Response time percentiles (P95, P99)
- **Error Tracking**: Error types and recent issues
- **System Health**: Component status monitoring

For detailed admin documentation, see [ADMIN_DASHBOARD.md](ADMIN_DASHBOARD.md).

## Development

### TypeScript Development

```bash
# Install dependencies
npm install

# Development mode with auto-compilation
npm run dev

# Build for production
npm run build

# Type checking
npm run typecheck
```

### Project Structure

```
talk2me/
├── app.py                 # Main Flask application
├── config.py             # Configuration management
├── requirements.txt      # Python dependencies
├── package.json         # Node.js dependencies
├── tsconfig.json        # TypeScript configuration
├── gunicorn_config.py   # Production server config
├── docker-compose.yml   # Container orchestration
├── static/
│   ├── js/
│   │   ├── src/        # TypeScript source files
│   │   └── dist/       # Compiled JavaScript
│   ├── css/            # Stylesheets
│   └── icons/          # PWA icons
├── templates/          # HTML templates
├── logs/              # Application logs
└── tests/             # Test suite
```

### Key Components

1. **Connection Management** (`connectionManager.ts`)
   - Automatic retry with exponential backoff
   - Request queuing during offline periods
   - Connection status monitoring

2. **Translation Cache** (`translationCache.ts`)
   - IndexedDB for offline support
   - LRU eviction policy
   - Automatic cache size management

3. **Speaker Management** (`speakerManager.ts`)
   - Multi-speaker conversation tracking
   - Speaker-specific audio handling
   - Conversation export functionality

4. **Error Handling** (`errorBoundary.ts`)
   - Global error catching
   - Automatic error reporting
   - User-friendly error messages

### Running Tests

```bash
# Python tests
pytest tests/ -v

# TypeScript tests
npm test

# Integration tests
python test_integration.py
```

## Monitoring & Operations

### Logging System

Talk2Me uses structured JSON logging with multiple streams:

```bash
logs/
├── talk2me.log      # General application log
├── errors.log       # Error-specific log
├── access.log       # HTTP access log
├── security.log     # Security events
└── performance.log  # Performance metrics
```

View logs:
```bash
# Recent errors
tail -f logs/errors.log | jq '.'

# Security events
grep "rate_limit_exceeded" logs/security.log | jq '.'

# Slow requests
jq 'select(.extra_fields.duration_ms > 1000)' logs/performance.log
```

### Memory Management

Talk2Me includes comprehensive memory leak prevention:

1. **Backend Memory Management**
   - GPU memory monitoring
   - Automatic model reloading
   - Temporary file cleanup

2. **Frontend Memory Management**
   - Audio blob cleanup
   - WebRTC resource management
   - Event listener cleanup

Monitor memory:
```bash
# Check memory stats
curl -H "X-Admin-Token: $ADMIN_TOKEN" http://localhost:5005/admin/memory

# Trigger manual cleanup
curl -X POST -H "X-Admin-Token: $ADMIN_TOKEN" \
  http://localhost:5005/admin/memory/cleanup
```

### Performance Tuning

#### GPU Optimization

```python
# config.py or environment
GPU_OPTIMIZATIONS = {
    'enabled': True,
    'fp16': True,           # Half precision for 2x speedup
    'batch_size': 1,        # Adjust based on GPU memory
    'num_workers': 2,       # Parallel data loading
    'pin_memory': True      # Faster GPU transfer
}
```

#### Whisper Optimization

```python
TRANSCRIBE_OPTIONS = {
    'beam_size': 1,         # Faster inference
    'best_of': 1,           # Disable multiple attempts
    'temperature': 0,       # Deterministic output
    'compression_ratio_threshold': 2.4,
    'logprob_threshold': -1.0,
    'no_speech_threshold': 0.6
}
```

### Scaling Considerations

1. **Horizontal Scaling**
   - Use Redis for shared rate limiting
   - Configure sticky sessions for WebSocket
   - Share audio files via object storage

2. **Vertical Scaling**
   - Increase worker processes
   - Tune thread pool size
   - Allocate more GPU memory

3. **Caching Strategy**
   - Cache translations in Redis
   - Use CDN for static assets
   - Enable HTTP caching headers

## Troubleshooting

### Common Issues

#### GPU Not Detected

```bash
# Check CUDA availability
python -c "import torch; print(torch.cuda.is_available())"

# Check GPU memory
nvidia-smi

# For AMD GPUs
rocm-smi

# For Apple Silicon
python -c "import torch; print(torch.backends.mps.is_available())"
```

#### High Memory Usage

```bash
# Check for memory leaks
curl -H "X-Admin-Token: $ADMIN_TOKEN" http://localhost:5005/health/storage

# Manual cleanup
curl -X POST -H "X-Admin-Token: $ADMIN_TOKEN" \
  http://localhost:5005/admin/cleanup
```

#### CORS Issues

```bash
# Test CORS configuration
curl -X OPTIONS http://localhost:5005/api/transcribe \
  -H "Origin: https://yourdomain.com" \
  -H "Access-Control-Request-Method: POST"
```

#### TTS Server Connection

```bash
# Check TTS server status
curl http://localhost:5005/check_tts_server

# Update TTS configuration
curl -X POST http://localhost:5005/update_tts_config \
  -H "Content-Type: application/json" \
  -d '{"server_url": "http://localhost:5050/v1/audio/speech", "api_key": "new-key"}'
```

### Debug Mode

Enable debug logging:
```bash
export FLASK_ENV=development
export LOG_LEVEL=DEBUG
python app.py
```

### Performance Profiling

```bash
# Enable performance logging
export ENABLE_PROFILING=true

# View slow requests
jq 'select(.duration_ms > 1000)' logs/performance.log
```

## Contributing

We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details.

### Development Setup

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Make your changes
4. Run tests (`pytest && npm test`)
5. Commit your changes (`git commit -m 'Add amazing feature'`)
6. Push to the branch (`git push origin feature/amazing-feature`)
7. Open a Pull Request

### Code Style

- Python: Follow PEP 8
- TypeScript: Use ESLint configuration
- Commit messages: Use conventional commits

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Acknowledgments

- OpenAI Whisper team for the amazing speech recognition model
- Ollama team for making LLMs accessible
- All contributors who have helped improve Talk2Me

## Support

- **Documentation**: Full docs at [docs.talk2me.app](https://docs.talk2me.app)
- **Issues**: [GitHub Issues](https://github.com/yourusername/talk2me/issues)
- **Discussions**: [GitHub Discussions](https://github.com/yourusername/talk2me/discussions)
- **Security**: Please report security vulnerabilities to security@talk2me.app