3 Commits

Author SHA1 Message Date
b5f2b53262 Housekeeping: Remove unnecessary test and temporary files
- Removed test scripts (test_*.py, tts-debug-script.py)
- Removed test output files (tts_test_output.mp3, test-cors.html)
- Removed redundant static/js/app.js (using TypeScript dist/ instead)
- Removed outdated setup-script.sh
- Removed Python cache directory (__pycache__)
- Removed Claude IDE local settings (.claude/)
- Updated .gitignore with better patterns for:
  - Test files
  - Debug scripts
  - Claude IDE settings
  - Standalone compiled JS

This cleanup reduces repository size and removes temporary/debug files
that shouldn't be version controlled.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-03 09:24:44 -06:00
bcbac5c8b3 Add multi-GPU support for Docker deployments
- Created separate docker-compose override files for different GPU types:
  - docker-compose.nvidia.yml for NVIDIA GPUs
  - docker-compose.amd.yml for AMD GPUs with ROCm
  - docker-compose.apple.yml for Apple Silicon
- Updated README with GPU-specific Docker configurations
- Updated deployment instructions to use appropriate override files
- Added detailed configurations for each GPU type including:
  - Device mappings and drivers
  - Environment variables
  - Platform specifications
  - Memory and resource limits

This allows users to easily deploy Talk2Me with their specific GPU hardware.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-03 09:16:41 -06:00
e5333d8410 Consolidate all documentation into comprehensive README
- Merged 12 separate documentation files into single README.md
- Organized content with clear table of contents
- Maintained all technical details and examples
- Improved overall documentation structure and flow
- Removed redundant separate documentation files

The new README provides a complete guide covering:
- Installation and configuration
- Security features (rate limiting, secrets, sessions)
- Production deployment with Docker/Nginx
- API documentation
- Development guidelines
- Monitoring and troubleshooting

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-03 09:10:58 -06:00
25 changed files with 797 additions and 6290 deletions

12
.gitignore vendored
View File

@@ -67,3 +67,15 @@ vapid_public.pem
.master_key
secrets.db
*.key
# Test files
test_*.py
*_test_output.*
test-*.html
*-debug-script.py
# Claude IDE
.claude/
# Standalone compiled JS (use dist/ instead)
static/js/app.js

View File

@@ -1,173 +0,0 @@
# Connection Retry Logic Documentation
This document explains the connection retry and network interruption handling features in Talk2Me.
## Overview
Talk2Me implements robust connection retry logic to handle network interruptions gracefully. When a connection is lost or a request fails due to network issues, the application automatically queues requests and retries them when the connection is restored.
## Features
### 1. Automatic Connection Monitoring
- Monitors browser online/offline events
- Periodic health checks to the server (every 5 seconds when offline)
- Visual connection status indicator
- Automatic detection when returning from sleep/hibernation
### 2. Request Queuing
- Failed requests are automatically queued during network interruptions
- Requests maintain their priority and are processed in order
- Queue persists across connection failures
- Visual indication of queued requests
### 3. Exponential Backoff Retry
- Failed requests are retried with exponential backoff
- Initial retry delay: 1 second
- Maximum retry delay: 30 seconds
- Backoff multiplier: 2x
- Maximum retries: 3 attempts
### 4. Connection Status UI
- Real-time connection status indicator (bottom-right corner)
- Offline banner with retry button
- Queue status showing pending requests by type
- Temporary status messages for important events
## User Experience
### When Connection is Lost
1. **Visual Indicators**:
- Connection status shows "Offline" or "Connection error"
- Red banner appears at top of screen
- Queued request count is displayed
2. **Request Handling**:
- New requests are automatically queued
- User sees "Connection error - queued" message
- Requests will be sent when connection returns
3. **Manual Retry**:
- Users can click "Retry" button in offline banner
- Forces immediate connection check
### When Connection is Restored
1. **Automatic Recovery**:
- Connection status changes to "Connecting..."
- Queued requests are processed automatically
- Success message shown briefly
2. **Request Processing**:
- Queued requests maintain their order
- Higher priority requests (transcription) processed first
- Progress indicators show processing status
## Configuration
The connection retry logic can be configured programmatically:
```javascript
// In app.ts or initialization code
connectionManager.configure({
maxRetries: 3, // Maximum retry attempts
initialDelay: 1000, // Initial retry delay (ms)
maxDelay: 30000, // Maximum retry delay (ms)
backoffMultiplier: 2, // Exponential backoff multiplier
timeout: 10000, // Request timeout (ms)
onlineCheckInterval: 5000 // Health check interval (ms)
});
```
## Request Priority
Requests are prioritized as follows:
1. **Transcription** (Priority: 8) - Highest priority
2. **Translation** (Priority: 5) - Normal priority
3. **TTS/Audio** (Priority: 3) - Lower priority
## Error Types
### Retryable Errors
- Network errors
- Connection timeouts
- Server errors (5xx)
- CORS errors (in some cases)
### Non-Retryable Errors
- Client errors (4xx)
- Authentication errors
- Rate limit errors
- Invalid request errors
## Best Practices
1. **For Users**:
- Wait for queued requests to complete before closing the app
- Use the manual retry button if automatic recovery fails
- Check the connection status indicator for current state
2. **For Developers**:
- All fetch requests should go through RequestQueueManager
- Use appropriate request priorities
- Handle both online and offline scenarios in UI
- Provide clear feedback about connection status
## Technical Implementation
### Key Components
1. **ConnectionManager** (`connectionManager.ts`):
- Monitors connection state
- Implements retry logic with exponential backoff
- Provides connection state subscriptions
2. **RequestQueueManager** (`requestQueue.ts`):
- Queues failed requests
- Integrates with ConnectionManager
- Handles request prioritization
3. **ConnectionUI** (`connectionUI.ts`):
- Displays connection status
- Shows offline banner
- Updates queue information
### Integration Example
```typescript
// Automatic integration through RequestQueueManager
const queue = RequestQueueManager.getInstance();
const data = await queue.enqueue<ResponseType>(
'translate', // Request type
async () => {
// Your fetch request
const response = await fetch('/api/translate', options);
return response.json();
},
5 // Priority (1-10, higher = more important)
);
```
## Troubleshooting
### Connection Not Detected
- Check browser permissions for network status
- Ensure health endpoint (/health) is accessible
- Verify no firewall/proxy blocking
### Requests Not Retrying
- Check browser console for errors
- Verify request type is retryable
- Check if max retries exceeded
### Queue Not Processing
- Manually trigger retry with button
- Check if requests are timing out
- Verify server is responding
## Future Enhancements
- Persistent queue storage (survive page refresh)
- Configurable retry strategies per request type
- Network speed detection and adaptation
- Progressive web app offline mode

View File

@@ -1,152 +0,0 @@
# CORS Configuration Guide
This document explains how to configure Cross-Origin Resource Sharing (CORS) for the Talk2Me application.
## Overview
CORS is configured using Flask-CORS to enable secure cross-origin usage of the API endpoints. This allows the Talk2Me application to be embedded in other websites or accessed from different domains while maintaining security.
## Environment Variables
### `CORS_ORIGINS`
Controls which domains are allowed to access the API endpoints.
- **Default**: `*` (allows all origins - use only for development)
- **Production Example**: `https://yourdomain.com,https://app.yourdomain.com`
- **Format**: Comma-separated list of allowed origins
```bash
# Development (allows all origins)
export CORS_ORIGINS="*"
# Production (restrict to specific domains)
export CORS_ORIGINS="https://talk2me.example.com,https://app.example.com"
```
### `ADMIN_CORS_ORIGINS`
Controls which domains can access admin endpoints (more restrictive).
- **Default**: `http://localhost:*` (allows all localhost ports)
- **Production Example**: `https://admin.yourdomain.com`
- **Format**: Comma-separated list of allowed admin origins
```bash
# Development
export ADMIN_CORS_ORIGINS="http://localhost:*"
# Production
export ADMIN_CORS_ORIGINS="https://admin.talk2me.example.com"
```
## Configuration Details
The CORS configuration includes:
- **Allowed Methods**: GET, POST, OPTIONS
- **Allowed Headers**: Content-Type, Authorization, X-Requested-With, X-Admin-Token
- **Exposed Headers**: Content-Range, X-Content-Range
- **Credentials Support**: Enabled (supports cookies and authorization headers)
- **Max Age**: 3600 seconds (preflight requests cached for 1 hour)
## Endpoints
All endpoints have CORS enabled with the following configuration:
### Regular API Endpoints
- `/api/*`
- `/transcribe`
- `/translate`
- `/translate/stream`
- `/speak`
- `/get_audio/*`
- `/check_tts_server`
- `/update_tts_config`
- `/health/*`
### Admin Endpoints (More Restrictive)
- `/admin/*` - Uses `ADMIN_CORS_ORIGINS` instead of general `CORS_ORIGINS`
## Security Best Practices
1. **Never use `*` in production** - Always specify exact allowed origins
2. **Use HTTPS** - Always use HTTPS URLs in production CORS origins
3. **Separate admin origins** - Keep admin endpoints on a separate, more restrictive origin list
4. **Review regularly** - Periodically review and update allowed origins
## Example Configurations
### Local Development
```bash
export CORS_ORIGINS="*"
export ADMIN_CORS_ORIGINS="http://localhost:*"
```
### Staging Environment
```bash
export CORS_ORIGINS="https://staging.talk2me.com,https://staging-app.talk2me.com"
export ADMIN_CORS_ORIGINS="https://staging-admin.talk2me.com"
```
### Production Environment
```bash
export CORS_ORIGINS="https://talk2me.com,https://app.talk2me.com"
export ADMIN_CORS_ORIGINS="https://admin.talk2me.com"
```
### Mobile App Integration
```bash
# Include mobile app schemes if needed
export CORS_ORIGINS="https://talk2me.com,https://app.talk2me.com,capacitor://localhost,ionic://localhost"
```
## Testing CORS Configuration
You can test CORS configuration using curl:
```bash
# Test preflight request
curl -X OPTIONS https://your-api.com/api/transcribe \
-H "Origin: https://allowed-origin.com" \
-H "Access-Control-Request-Method: POST" \
-H "Access-Control-Request-Headers: Content-Type" \
-v
# Test actual request
curl -X POST https://your-api.com/api/transcribe \
-H "Origin: https://allowed-origin.com" \
-H "Content-Type: application/json" \
-d '{"test": "data"}' \
-v
```
## Troubleshooting
### CORS Errors in Browser Console
If you see CORS errors:
1. Check that the origin is included in `CORS_ORIGINS`
2. Ensure the URL protocol matches (http vs https)
3. Check for trailing slashes in origins
4. Verify environment variables are set correctly
### Common Issues
1. **"No 'Access-Control-Allow-Origin' header"**
- Origin not in allowed list
- Check `CORS_ORIGINS` environment variable
2. **"CORS policy: The request client is not a secure context"**
- Using HTTP instead of HTTPS
- Update to use HTTPS in production
3. **"CORS policy: Credentials flag is true, but Access-Control-Allow-Credentials is not 'true'"**
- This should not occur with current configuration
- Check that `supports_credentials` is True in CORS config
## Additional Resources
- [MDN CORS Documentation](https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS)
- [Flask-CORS Documentation](https://flask-cors.readthedocs.io/)

View File

@@ -1,460 +0,0 @@
# Error Logging Documentation
This document describes the comprehensive error logging system implemented in Talk2Me for debugging production issues.
## Overview
Talk2Me implements a structured logging system that provides:
- JSON-formatted structured logs for easy parsing
- Multiple log streams (app, errors, access, security, performance)
- Automatic log rotation to prevent disk space issues
- Request tracing with unique IDs
- Performance metrics collection
- Security event tracking
- Error deduplication and frequency tracking
## Log Types
### 1. Application Logs (`logs/talk2me.log`)
General application logs including info, warnings, and debug messages.
```json
{
"timestamp": "2024-01-15T10:30:45.123Z",
"level": "INFO",
"logger": "talk2me",
"message": "Whisper model loaded successfully",
"app": "talk2me",
"environment": "production",
"hostname": "server-1",
"thread": "MainThread",
"process": 12345
}
```
### 2. Error Logs (`logs/errors.log`)
Dedicated error logging with full exception details and stack traces.
```json
{
"timestamp": "2024-01-15T10:31:00.456Z",
"level": "ERROR",
"logger": "talk2me.errors",
"message": "Error in transcribe: File too large",
"exception": {
"type": "ValueError",
"message": "Audio file exceeds maximum size",
"traceback": ["...full stack trace..."]
},
"request_id": "1234567890-abcdef",
"endpoint": "transcribe",
"method": "POST",
"path": "/transcribe",
"ip": "192.168.1.100"
}
```
### 3. Access Logs (`logs/access.log`)
HTTP request/response logging for traffic analysis.
```json
{
"timestamp": "2024-01-15T10:32:00.789Z",
"level": "INFO",
"message": "request_complete",
"request_id": "1234567890-abcdef",
"method": "POST",
"path": "/transcribe",
"status": 200,
"duration_ms": 1250,
"content_length": 4096,
"ip": "192.168.1.100",
"user_agent": "Mozilla/5.0..."
}
```
### 4. Security Logs (`logs/security.log`)
Security-related events and suspicious activities.
```json
{
"timestamp": "2024-01-15T10:33:00.123Z",
"level": "WARNING",
"message": "Security event: rate_limit_exceeded",
"event": "rate_limit_exceeded",
"severity": "warning",
"ip": "192.168.1.100",
"endpoint": "/transcribe",
"attempts": 15,
"blocked": true
}
```
### 5. Performance Logs (`logs/performance.log`)
Performance metrics and slow request tracking.
```json
{
"timestamp": "2024-01-15T10:34:00.456Z",
"level": "INFO",
"message": "Performance metric: transcribe_audio",
"metric": "transcribe_audio",
"duration_ms": 2500,
"function": "transcribe",
"module": "app",
"request_id": "1234567890-abcdef"
}
```
## Configuration
### Environment Variables
```bash
# Log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
export LOG_LEVEL=INFO
# Log file paths
export LOG_FILE=logs/talk2me.log
export ERROR_LOG_FILE=logs/errors.log
# Log rotation settings
export LOG_MAX_BYTES=52428800 # 50MB
export LOG_BACKUP_COUNT=10 # Keep 10 backup files
# Environment
export FLASK_ENV=production
```
### Flask Configuration
```python
app.config.update({
'LOG_LEVEL': 'INFO',
'LOG_FILE': 'logs/talk2me.log',
'ERROR_LOG_FILE': 'logs/errors.log',
'LOG_MAX_BYTES': 50 * 1024 * 1024,
'LOG_BACKUP_COUNT': 10
})
```
## Admin API Endpoints
### GET /admin/logs/errors
View recent error logs and error frequency statistics.
```bash
curl -H "X-Admin-Token: your-token" http://localhost:5005/admin/logs/errors
```
Response:
```json
{
"error_summary": {
"abc123def456": {
"count_last_hour": 5,
"last_seen": 1705320000
}
},
"recent_errors": [...],
"total_errors_logged": 150
}
```
### GET /admin/logs/performance
View performance metrics and slow requests.
```bash
curl -H "X-Admin-Token: your-token" http://localhost:5005/admin/logs/performance
```
Response:
```json
{
"performance_metrics": {
"transcribe_audio": {
"avg_ms": 850.5,
"max_ms": 3200,
"min_ms": 125,
"count": 1024
}
},
"slow_requests": [
{
"metric": "transcribe_audio",
"duration_ms": 3200,
"timestamp": "2024-01-15T10:35:00Z"
}
]
}
```
### GET /admin/logs/security
View security events and suspicious activities.
```bash
curl -H "X-Admin-Token: your-token" http://localhost:5005/admin/logs/security
```
Response:
```json
{
"security_events": [...],
"event_summary": {
"rate_limit_exceeded": 25,
"suspicious_error": 3,
"high_error_rate": 1
},
"total_events": 29
}
```
## Usage Patterns
### 1. Logging Errors with Context
```python
from error_logger import log_exception
try:
# Some operation
process_audio(file)
except Exception as e:
log_exception(
e,
message="Failed to process audio",
user_id=user.id,
file_size=file.size,
file_type=file.content_type
)
```
### 2. Performance Monitoring
```python
from error_logger import log_performance
@log_performance('expensive_operation')
def process_large_file(file):
# This will automatically log execution time
return processed_data
```
### 3. Security Event Logging
```python
app.error_logger.log_security(
'unauthorized_access',
severity='warning',
ip=request.remote_addr,
attempted_resource='/admin',
user_agent=request.headers.get('User-Agent')
)
```
### 4. Request Context
```python
from error_logger import log_context
with log_context(user_id=user.id, feature='translation'):
# All logs within this context will include user_id and feature
translate_text(text)
```
## Log Analysis
### Finding Specific Errors
```bash
# Find all authentication errors
grep '"error_type":"AuthenticationError"' logs/errors.log | jq .
# Find errors from specific IP
grep '"ip":"192.168.1.100"' logs/errors.log | jq .
# Find errors in last hour
grep "$(date -u -d '1 hour ago' +%Y-%m-%dT%H)" logs/errors.log | jq .
```
### Performance Analysis
```bash
# Find slow requests (>2000ms)
jq 'select(.extra_fields.duration_ms > 2000)' logs/performance.log
# Calculate average response time for endpoint
jq 'select(.extra_fields.metric == "transcribe_audio") | .extra_fields.duration_ms' logs/performance.log | awk '{sum+=$1; count++} END {print sum/count}'
```
### Security Monitoring
```bash
# Count security events by type
jq '.extra_fields.event' logs/security.log | sort | uniq -c
# Find all blocked IPs
jq 'select(.extra_fields.blocked == true) | .extra_fields.ip' logs/security.log | sort -u
```
## Log Rotation
Logs are automatically rotated based on size or time:
- **Application/Error logs**: Rotate at 50MB, keep 10 backups
- **Access logs**: Daily rotation, keep 30 days
- **Performance logs**: Hourly rotation, keep 7 days
- **Security logs**: Rotate at 50MB, keep 10 backups
Rotated logs are named with numeric suffixes:
- `talk2me.log` (current)
- `talk2me.log.1` (most recent backup)
- `talk2me.log.2` (older backup)
- etc.
## Best Practices
### 1. Structured Logging
Always include relevant context:
```python
logger.info("User action completed", extra={
'extra_fields': {
'user_id': user.id,
'action': 'upload_audio',
'file_size': file.size,
'duration_ms': processing_time
}
})
```
### 2. Error Handling
Log errors at appropriate levels:
```python
try:
result = risky_operation()
except ValidationError as e:
logger.warning(f"Validation failed: {e}") # Expected errors
except Exception as e:
logger.error(f"Unexpected error: {e}", exc_info=True) # Unexpected errors
```
### 3. Performance Tracking
Track key operations:
```python
start = time.time()
result = expensive_operation()
duration = (time.time() - start) * 1000
app.error_logger.log_performance(
'expensive_operation',
value=duration,
input_size=len(data),
output_size=len(result)
)
```
### 4. Security Awareness
Log security-relevant events:
```python
if failed_attempts > 3:
app.error_logger.log_security(
'multiple_failed_attempts',
severity='warning',
ip=request.remote_addr,
attempts=failed_attempts
)
```
## Monitoring Integration
### Prometheus Metrics
Export log metrics for Prometheus:
```python
@app.route('/metrics')
def prometheus_metrics():
error_summary = app.error_logger.get_error_summary()
# Format as Prometheus metrics
return format_prometheus_metrics(error_summary)
```
### ELK Stack
Ship logs to Elasticsearch:
```yaml
filebeat.inputs:
- type: log
paths:
- /app/logs/*.log
json.keys_under_root: true
json.add_error_key: true
```
### CloudWatch
For AWS deployments:
```python
# Install boto3 and watchtower
import watchtower
cloudwatch_handler = watchtower.CloudWatchLogHandler()
logger.addHandler(cloudwatch_handler)
```
## Troubleshooting
### Common Issues
#### 1. Logs Not Being Written
Check permissions:
```bash
ls -la logs/
# Should show write permissions for app user
```
Create logs directory:
```bash
mkdir -p logs
chmod 755 logs
```
#### 2. Disk Space Issues
Monitor log sizes:
```bash
du -sh logs/*
```
Force rotation:
```bash
# Manually rotate logs
mv logs/talk2me.log logs/talk2me.log.backup
# App will create new log file
```
#### 3. Performance Impact
If logging impacts performance:
- Increase LOG_LEVEL to WARNING or ERROR
- Reduce backup count
- Use asynchronous logging (future enhancement)
## Security Considerations
1. **Log Sanitization**: Sensitive data is automatically masked
2. **Access Control**: Admin endpoints require authentication
3. **Log Retention**: Old logs are automatically deleted
4. **Encryption**: Consider encrypting logs at rest in production
5. **Audit Trail**: All log access is itself logged
## Future Enhancements
1. **Centralized Logging**: Ship logs to centralized service
2. **Real-time Alerts**: Trigger alerts on error patterns
3. **Log Analytics**: Built-in log analysis dashboard
4. **Correlation IDs**: Track requests across microservices
5. **Async Logging**: Reduce performance impact

View File

@@ -1,68 +0,0 @@
# GPU Support for Talk2Me
## Current GPU Support Status
### ✅ NVIDIA GPUs (Full Support)
- **Requirements**: CUDA 11.x or 12.x
- **Optimizations**:
- TensorFloat-32 (TF32) for Ampere GPUs (RTX 30xx, A100)
- cuDNN auto-tuning
- Half-precision (FP16) inference
- CUDA kernel pre-caching
- Memory pre-allocation
### ⚠️ AMD GPUs (Limited Support)
- **Requirements**: ROCm 5.x installation
- **Status**: Falls back to CPU unless ROCm is properly configured
- **To enable AMD GPU**:
```bash
# Install PyTorch with ROCm support
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.6
```
- **Limitations**:
- No cuDNN optimizations
- May have compatibility issues
- Performance varies by GPU model
### ✅ Apple Silicon (M1/M2/M3)
- **Requirements**: macOS 12.3+
- **Status**: Uses Metal Performance Shaders (MPS)
- **Optimizations**:
- Native Metal acceleration
- Unified memory architecture benefits
- No FP16 (not well supported on MPS yet)
### 📊 Performance Comparison
| GPU Type | First Transcription | Subsequent | Notes |
|----------|-------------------|------------|-------|
| NVIDIA RTX 3080 | ~2s | ~0.5s | Full optimizations |
| AMD RX 6800 XT | ~3-4s | ~1-2s | With ROCm |
| Apple M2 | ~2.5s | ~1s | MPS acceleration |
| CPU (i7-12700K) | ~5-10s | ~5-10s | No acceleration |
## Checking Your GPU Status
Run the app and check the logs:
```
INFO: NVIDIA GPU detected - using CUDA acceleration
INFO: GPU memory allocated: 542.00 MB
INFO: Whisper model loaded and optimized for NVIDIA GPU
```
## Troubleshooting
### AMD GPU Not Detected
1. Install ROCm-compatible PyTorch
2. Set environment variable: `export HSA_OVERRIDE_GFX_VERSION=10.3.0`
3. Check with: `rocm-smi`
### NVIDIA GPU Not Used
1. Check CUDA installation: `nvidia-smi`
2. Verify PyTorch CUDA: `python -c "import torch; print(torch.cuda.is_available())"`
3. Install CUDA toolkit if needed
### Apple Silicon Not Accelerated
1. Update macOS to 12.3+
2. Update PyTorch: `pip install --upgrade torch`
3. Check MPS: `python -c "import torch; print(torch.backends.mps.is_available())"`

View File

@@ -1,285 +0,0 @@
# Memory Management Documentation
This document describes the comprehensive memory management system implemented in Talk2Me to prevent memory leaks and crashes after extended use.
## Overview
Talk2Me implements a dual-layer memory management system:
1. **Backend (Python)**: Manages GPU memory, Whisper model, and temporary files
2. **Frontend (JavaScript)**: Manages audio blobs, object URLs, and Web Audio contexts
## Memory Leak Issues Addressed
### Backend Memory Leaks
1. **GPU Memory Fragmentation**
- Whisper model accumulates GPU memory over time
- Solution: Periodic GPU cache clearing and model reloading
2. **Temporary File Accumulation**
- Audio files not cleaned up quickly enough under load
- Solution: Aggressive cleanup with tracking and periodic sweeps
3. **Session Resource Leaks**
- Long-lived sessions accumulate resources
- Solution: Integration with session manager for resource limits
### Frontend Memory Leaks
1. **Audio Blob Leaks**
- MediaRecorder chunks kept in memory
- Solution: SafeMediaRecorder wrapper with automatic cleanup
2. **Object URL Leaks**
- URLs created but not revoked
- Solution: Centralized tracking and automatic revocation
3. **AudioContext Leaks**
- Contexts created but never closed
- Solution: MemoryManager tracks and closes contexts
4. **MediaStream Leaks**
- Microphone streams not properly stopped
- Solution: Automatic track stopping and stream cleanup
## Backend Memory Management
### MemoryManager Class
The `MemoryManager` monitors and manages memory usage:
```python
memory_manager = MemoryManager(app, {
'memory_threshold_mb': 4096, # 4GB process memory limit
'gpu_memory_threshold_mb': 2048, # 2GB GPU memory limit
'cleanup_interval': 30 # Check every 30 seconds
})
```
### Features
1. **Automatic Monitoring**
- Background thread checks memory usage
- Triggers cleanup when thresholds exceeded
- Logs statistics every 5 minutes
2. **GPU Memory Management**
- Clears CUDA cache after each operation
- Reloads Whisper model if fragmentation detected
- Tracks reload count and timing
3. **Temporary File Cleanup**
- Tracks all temporary files
- Age-based cleanup (5 minutes normal, 1 minute aggressive)
- Cleanup on process exit
4. **Context Managers**
```python
with AudioProcessingContext(memory_manager) as ctx:
# Process audio
ctx.add_temp_file(temp_path)
# Files automatically cleaned up
```
### Admin Endpoints
- `GET /admin/memory` - View current memory statistics
- `POST /admin/memory/cleanup` - Trigger manual cleanup
## Frontend Memory Management
### MemoryManager Class
Centralized tracking of all browser resources:
```typescript
const memoryManager = MemoryManager.getInstance();
// Register resources
memoryManager.registerAudioContext(context);
memoryManager.registerObjectURL(url);
memoryManager.registerMediaStream(stream);
```
### SafeMediaRecorder
Wrapper for MediaRecorder with automatic cleanup:
```typescript
const recorder = new SafeMediaRecorder();
await recorder.start(constraints);
// Recording...
const blob = await recorder.stop(); // Automatically cleans up
```
### AudioBlobHandler
Safe handling of audio blobs and object URLs:
```typescript
const handler = new AudioBlobHandler(blob);
const url = handler.getObjectURL(); // Tracked automatically
// Use URL...
handler.cleanup(); // Revokes URL and clears references
```
## Memory Thresholds
### Backend Thresholds
| Resource | Default Limit | Configurable Via |
|----------|--------------|------------------|
| Process Memory | 4096 MB | MEMORY_THRESHOLD_MB |
| GPU Memory | 2048 MB | GPU_MEMORY_THRESHOLD_MB |
| Temp File Age | 300 seconds | Built-in |
| Model Reload Interval | 300 seconds | Built-in |
### Frontend Thresholds
| Resource | Cleanup Trigger |
|----------|----------------|
| Closed AudioContexts | Every 30 seconds |
| Stopped MediaStreams | Every 30 seconds |
| Orphaned Object URLs | On navigation/unload |
## Best Practices
### Backend
1. **Use Context Managers**
```python
@with_memory_management
def process_audio():
# Automatic cleanup
```
2. **Register Temporary Files**
```python
register_temp_file(path)
ctx.add_temp_file(path)
```
3. **Clear GPU Memory**
```python
torch.cuda.empty_cache()
torch.cuda.synchronize()
```
### Frontend
1. **Use Safe Wrappers**
```typescript
// Don't use raw MediaRecorder
const recorder = new SafeMediaRecorder();
```
2. **Clean Up Handlers**
```typescript
if (audioHandler) {
audioHandler.cleanup();
}
```
3. **Register All Resources**
```typescript
const context = new AudioContext();
memoryManager.registerAudioContext(context);
```
## Monitoring
### Backend Monitoring
```bash
# View memory stats
curl -H "X-Admin-Token: token" http://localhost:5005/admin/memory
# Response
{
"memory": {
"process_mb": 850.5,
"system_percent": 45.2,
"gpu_mb": 1250.0,
"gpu_percent": 61.0
},
"temp_files": {
"count": 5,
"size_mb": 12.5
},
"model": {
"reload_count": 2,
"last_reload": "2024-01-15T10:30:00"
}
}
```
### Frontend Monitoring
```javascript
// Get memory stats
const stats = memoryManager.getStats();
console.log('Active contexts:', stats.audioContexts);
console.log('Object URLs:', stats.objectURLs);
```
## Troubleshooting
### High Memory Usage
1. **Check Current Usage**
```bash
curl -H "X-Admin-Token: token" http://localhost:5005/admin/memory
```
2. **Trigger Manual Cleanup**
```bash
curl -X POST -H "X-Admin-Token: token" \
http://localhost:5005/admin/memory/cleanup
```
3. **Check Logs**
```bash
grep "Memory" logs/talk2me.log
grep "GPU memory" logs/talk2me.log
```
### Memory Leak Symptoms
1. **Backend**
- Process memory continuously increasing
- GPU memory not returning to baseline
- Temp files accumulating in upload folder
- Slower transcription over time
2. **Frontend**
- Browser tab memory increasing
- Page becoming unresponsive
- Audio playback issues
- Console errors about contexts
### Debug Mode
Enable debug logging:
```python
# Backend
app.config['DEBUG_MEMORY'] = True
# Frontend (in console)
localStorage.setItem('DEBUG_MEMORY', 'true');
```
## Performance Impact
Memory management adds minimal overhead:
- Backend: ~30ms per cleanup cycle
- Frontend: <5ms per resource registration
- Cleanup operations are non-blocking
- Model reloading takes ~2-3 seconds (rare)
## Future Enhancements
1. **Predictive Cleanup**: Clean resources based on usage patterns
2. **Memory Pooling**: Reuse audio buffers and contexts
3. **Distributed Memory**: Share memory stats across instances
4. **Alert System**: Notify admins of memory issues
5. **Auto-scaling**: Scale resources based on memory pressure

View File

@@ -1,435 +0,0 @@
# Production Deployment Guide
This guide covers deploying Talk2Me in a production environment using a proper WSGI server.
## Overview
The Flask development server is not suitable for production use. This guide covers:
- Gunicorn as the WSGI server
- Nginx as a reverse proxy
- Docker for containerization
- Systemd for process management
- Security best practices
## Quick Start with Docker
### 1. Using Docker Compose
```bash
# Clone the repository
git clone https://github.com/your-repo/talk2me.git
cd talk2me
# Create .env file with production settings
cat > .env <<EOF
TTS_API_KEY=your-api-key
ADMIN_TOKEN=your-secure-admin-token
SECRET_KEY=your-secure-secret-key
POSTGRES_PASSWORD=your-secure-db-password
EOF
# Build and start services
docker-compose up -d
# Check status
docker-compose ps
docker-compose logs -f talk2me
```
### 2. Using Docker (standalone)
```bash
# Build the image
docker build -t talk2me .
# Run the container
docker run -d \
--name talk2me \
-p 5005:5005 \
-e TTS_API_KEY=your-api-key \
-e ADMIN_TOKEN=your-secure-token \
-e SECRET_KEY=your-secure-key \
-v $(pwd)/logs:/app/logs \
talk2me
```
## Manual Deployment
### 1. System Requirements
- Ubuntu 20.04+ or similar Linux distribution
- Python 3.8+
- Nginx
- Systemd
- 4GB+ RAM recommended
- GPU (optional, for faster transcription)
### 2. Installation
Run the deployment script as root:
```bash
sudo ./deploy.sh
```
Or manually:
```bash
# Install system dependencies
sudo apt-get update
sudo apt-get install -y python3-pip python3-venv nginx
# Create application user
sudo useradd -m -s /bin/bash talk2me
# Create directories
sudo mkdir -p /opt/talk2me /var/log/talk2me
sudo chown talk2me:talk2me /opt/talk2me /var/log/talk2me
# Copy application files
sudo cp -r . /opt/talk2me/
sudo chown -R talk2me:talk2me /opt/talk2me
# Install Python dependencies
sudo -u talk2me python3 -m venv /opt/talk2me/venv
sudo -u talk2me /opt/talk2me/venv/bin/pip install -r requirements-prod.txt
# Configure and start services
sudo cp talk2me.service /etc/systemd/system/
sudo systemctl enable talk2me
sudo systemctl start talk2me
```
## Gunicorn Configuration
The `gunicorn_config.py` file contains production-ready settings:
### Worker Configuration
```python
# Number of worker processes
workers = multiprocessing.cpu_count() * 2 + 1
# Worker timeout (increased for audio processing)
timeout = 120
# Restart workers periodically to prevent memory leaks
max_requests = 1000
max_requests_jitter = 50
```
### Performance Tuning
For different workloads:
```bash
# CPU-bound (transcription heavy)
export GUNICORN_WORKERS=8
export GUNICORN_THREADS=1
# I/O-bound (many concurrent requests)
export GUNICORN_WORKERS=4
export GUNICORN_THREADS=4
export GUNICORN_WORKER_CLASS=gthread
# Async (best concurrency)
export GUNICORN_WORKER_CLASS=gevent
export GUNICORN_WORKER_CONNECTIONS=1000
```
## Nginx Configuration
### Basic Setup
The provided `nginx.conf` includes:
- Reverse proxy to Gunicorn
- Static file serving
- WebSocket support
- Security headers
- Gzip compression
### SSL/TLS Setup
```nginx
server {
listen 443 ssl http2;
server_name your-domain.com;
ssl_certificate /etc/letsencrypt/live/your-domain.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/your-domain.com/privkey.pem;
# Strong SSL configuration
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;
ssl_prefer_server_ciphers off;
# HSTS
add_header Strict-Transport-Security "max-age=63072000" always;
}
```
## Environment Variables
### Required
```bash
# Security
SECRET_KEY=your-very-secure-secret-key
ADMIN_TOKEN=your-admin-api-token
# TTS Configuration
TTS_API_KEY=your-tts-api-key
TTS_SERVER_URL=http://your-tts-server:5050/v1/audio/speech
# Flask
FLASK_ENV=production
```
### Optional
```bash
# Performance
GUNICORN_WORKERS=4
GUNICORN_THREADS=2
MEMORY_THRESHOLD_MB=4096
GPU_MEMORY_THRESHOLD_MB=2048
# Database (for session storage)
DATABASE_URL=postgresql://user:pass@localhost/talk2me
REDIS_URL=redis://localhost:6379/0
# Monitoring
SENTRY_DSN=your-sentry-dsn
```
## Monitoring
### Health Checks
```bash
# Basic health check
curl http://localhost:5005/health
# Detailed health check
curl http://localhost:5005/health/detailed
# Memory usage
curl -H "X-Admin-Token: your-token" http://localhost:5005/admin/memory
```
### Logs
```bash
# Application logs
tail -f /var/log/talk2me/talk2me.log
# Error logs
tail -f /var/log/talk2me/errors.log
# Gunicorn logs
journalctl -u talk2me -f
# Nginx logs
tail -f /var/log/nginx/access.log
tail -f /var/log/nginx/error.log
```
### Metrics
With Prometheus client installed:
```bash
# Prometheus metrics endpoint
curl http://localhost:5005/metrics
```
## Scaling
### Horizontal Scaling
For multiple servers:
1. Use Redis for session storage
2. Use PostgreSQL for persistent data
3. Load balance with Nginx:
```nginx
upstream talk2me_backends {
least_conn;
server server1:5005 weight=1;
server server2:5005 weight=1;
server server3:5005 weight=1;
}
```
### Vertical Scaling
Adjust based on load:
```bash
# High memory usage
MEMORY_THRESHOLD_MB=8192
GPU_MEMORY_THRESHOLD_MB=4096
# More workers
GUNICORN_WORKERS=16
GUNICORN_THREADS=4
# Larger file limits
client_max_body_size 100M;
```
## Security
### Firewall
```bash
# Allow only necessary ports
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw allow 22/tcp
sudo ufw enable
```
### File Permissions
```bash
# Secure file permissions
sudo chmod 750 /opt/talk2me
sudo chmod 640 /opt/talk2me/.env
sudo chmod 755 /opt/talk2me/static
```
### AppArmor/SELinux
Create security profiles to restrict application access.
## Backup
### Database Backup
```bash
# PostgreSQL
pg_dump talk2me > backup.sql
# Redis
redis-cli BGSAVE
```
### Application Backup
```bash
# Backup application and logs
tar -czf talk2me-backup.tar.gz \
/opt/talk2me \
/var/log/talk2me \
/etc/systemd/system/talk2me.service \
/etc/nginx/sites-available/talk2me
```
## Troubleshooting
### Service Won't Start
```bash
# Check service status
systemctl status talk2me
# Check logs
journalctl -u talk2me -n 100
# Test configuration
sudo -u talk2me /opt/talk2me/venv/bin/gunicorn --check-config wsgi:application
```
### High Memory Usage
```bash
# Trigger cleanup
curl -X POST -H "X-Admin-Token: token" http://localhost:5005/admin/memory/cleanup
# Restart workers
systemctl reload talk2me
```
### Slow Response Times
1. Check worker count
2. Enable async workers
3. Check GPU availability
4. Review nginx buffering settings
## Performance Optimization
### 1. Enable GPU
Ensure CUDA/ROCm is properly installed:
```bash
# Check GPU
nvidia-smi # or rocm-smi
# Set in environment
export CUDA_VISIBLE_DEVICES=0
```
### 2. Optimize Workers
```python
# For CPU-heavy workloads
workers = cpu_count()
threads = 1
# For I/O-heavy workloads
workers = cpu_count() * 2
threads = 4
```
### 3. Enable Caching
Use Redis for caching translations:
```python
CACHE_TYPE = 'redis'
CACHE_REDIS_URL = 'redis://localhost:6379/0'
```
## Maintenance
### Regular Tasks
1. **Log Rotation**: Configured automatically
2. **Database Cleanup**: Run weekly
3. **Model Updates**: Check for Whisper updates
4. **Security Updates**: Keep dependencies updated
### Update Procedure
```bash
# Backup first
./backup.sh
# Update code
git pull
# Update dependencies
sudo -u talk2me /opt/talk2me/venv/bin/pip install -r requirements-prod.txt
# Restart service
sudo systemctl restart talk2me
```
## Rollback
If deployment fails:
```bash
# Stop service
sudo systemctl stop talk2me
# Restore backup
tar -xzf talk2me-backup.tar.gz -C /
# Restart service
sudo systemctl start talk2me
```

View File

@@ -1,235 +0,0 @@
# Rate Limiting Documentation
This document describes the rate limiting implementation in Talk2Me to protect against DoS attacks and resource exhaustion.
## Overview
Talk2Me implements a comprehensive rate limiting system with:
- Token bucket algorithm with sliding window
- Per-endpoint configurable limits
- IP-based blocking (temporary and permanent)
- Global request limits
- Concurrent request throttling
- Request size validation
## Rate Limits by Endpoint
### Transcription (`/transcribe`)
- **Per Minute**: 10 requests
- **Per Hour**: 100 requests
- **Burst Size**: 3 requests
- **Max Request Size**: 10MB
- **Token Refresh**: 1 token per 6 seconds
### Translation (`/translate`)
- **Per Minute**: 20 requests
- **Per Hour**: 300 requests
- **Burst Size**: 5 requests
- **Max Request Size**: 100KB
- **Token Refresh**: 1 token per 3 seconds
### Streaming Translation (`/translate/stream`)
- **Per Minute**: 10 requests
- **Per Hour**: 150 requests
- **Burst Size**: 3 requests
- **Max Request Size**: 100KB
- **Token Refresh**: 1 token per 6 seconds
### Text-to-Speech (`/speak`)
- **Per Minute**: 15 requests
- **Per Hour**: 200 requests
- **Burst Size**: 3 requests
- **Max Request Size**: 50KB
- **Token Refresh**: 1 token per 4 seconds
### API Endpoints
- Push notifications, error logging: Various limits (see code)
## Global Limits
- **Total Requests Per Minute**: 1,000 (across all endpoints)
- **Total Requests Per Hour**: 10,000
- **Concurrent Requests**: 50 maximum
## Rate Limiting Headers
Successful responses include:
```
X-RateLimit-Limit: 20
X-RateLimit-Remaining: 15
X-RateLimit-Reset: 1234567890
```
Rate limited responses (429) include:
```
X-RateLimit-Limit: 20
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1234567890
Retry-After: 60
```
## Client Identification
Clients are identified by:
- IP address (including X-Forwarded-For support)
- User-Agent string
- Combined hash for uniqueness
## Automatic Blocking
IPs are temporarily blocked for 1 hour if:
- They exceed 100 requests per minute
- They repeatedly hit rate limits
- They exhibit suspicious patterns
## Configuration
### Environment Variables
```bash
# No direct environment variables for rate limiting
# Configured in code - can be extended to use env vars
```
### Programmatic Configuration
Rate limits can be adjusted in `rate_limiter.py`:
```python
self.endpoint_limits = {
'/transcribe': {
'requests_per_minute': 10,
'requests_per_hour': 100,
'burst_size': 3,
'token_refresh_rate': 0.167,
'max_request_size': 10 * 1024 * 1024 # 10MB
}
}
```
## Admin Endpoints
### Get Rate Limit Configuration
```bash
curl -H "X-Admin-Token: your-admin-token" \
http://localhost:5005/admin/rate-limits
```
### Get Rate Limit Statistics
```bash
# Global stats
curl -H "X-Admin-Token: your-admin-token" \
http://localhost:5005/admin/rate-limits/stats
# Client-specific stats
curl -H "X-Admin-Token: your-admin-token" \
http://localhost:5005/admin/rate-limits/stats?client_id=abc123
```
### Block IP Address
```bash
# Temporary block (1 hour)
curl -X POST -H "X-Admin-Token: your-admin-token" \
-H "Content-Type: application/json" \
-d '{"ip": "192.168.1.100", "duration": 3600}' \
http://localhost:5005/admin/block-ip
# Permanent block
curl -X POST -H "X-Admin-Token: your-admin-token" \
-H "Content-Type: application/json" \
-d '{"ip": "192.168.1.100", "permanent": true}' \
http://localhost:5005/admin/block-ip
```
## Algorithm Details
### Token Bucket
- Each client gets a bucket with configurable burst size
- Tokens regenerate at a fixed rate
- Requests consume tokens
- Empty bucket = request denied
### Sliding Window
- Tracks requests in the last minute and hour
- More accurate than fixed windows
- Prevents gaming the system at window boundaries
## Best Practices
### For Users
1. Implement exponential backoff when receiving 429 errors
2. Check rate limit headers to avoid hitting limits
3. Cache responses when possible
4. Use bulk operations where available
### For Administrators
1. Monitor rate limit statistics regularly
2. Adjust limits based on usage patterns
3. Use IP blocking sparingly
4. Set up alerts for suspicious activity
## Error Responses
### Rate Limited (429)
```json
{
"error": "Rate limit exceeded (per minute)",
"retry_after": 60
}
```
### Request Too Large (413)
```json
{
"error": "Request too large"
}
```
### IP Blocked (429)
```json
{
"error": "IP temporarily blocked due to excessive requests"
}
```
## Monitoring
Key metrics to monitor:
- Rate limit hits by endpoint
- Blocked IPs
- Concurrent request peaks
- Request size violations
- Global limit approaches
## Performance Impact
- Minimal overhead (~1-2ms per request)
- Memory usage scales with active clients
- Automatic cleanup of old buckets
- Thread-safe implementation
## Security Considerations
1. **DoS Protection**: Prevents resource exhaustion
2. **Burst Control**: Limits sudden traffic spikes
3. **Size Validation**: Prevents large payload attacks
4. **IP Blocking**: Stops persistent attackers
5. **Global Limits**: Protects overall system capacity
## Troubleshooting
### "Rate limit exceeded" errors
- Check client request patterns
- Verify time synchronization
- Look for retry loops
- Check IP blocking status
### Memory usage increasing
- Verify cleanup thread is running
- Check for client ID explosion
- Monitor bucket count
### Legitimate users blocked
- Review rate limit settings
- Check for shared IP issues
- Implement IP whitelisting if needed

840
README.md
View File

@@ -1,9 +1,30 @@
# Voice Language Translator
# Talk2Me - Real-Time Voice Language Translator
A mobile-friendly web application that translates spoken language between multiple languages using:
- Gemma 3 open-source LLM via Ollama for translation
- OpenAI Whisper for speech-to-text
- OpenAI Edge TTS for text-to-speech
A production-ready, mobile-friendly web application that provides real-time translation of spoken language between multiple languages.
## Features
- **Real-time Speech Recognition**: Powered by OpenAI Whisper with GPU acceleration
- **Advanced Translation**: Using Gemma 3 open-source LLM via Ollama
- **Natural Text-to-Speech**: OpenAI Edge TTS for lifelike voice output
- **Progressive Web App**: Full offline support with service workers
- **Multi-Speaker Support**: Track and translate conversations with multiple participants
- **Enterprise Security**: Comprehensive rate limiting, session management, and encrypted secrets
- **Production Ready**: Docker support, load balancing, and extensive monitoring
## Table of Contents
- [Supported Languages](#supported-languages)
- [Quick Start](#quick-start)
- [Installation](#installation)
- [Configuration](#configuration)
- [Security Features](#security-features)
- [Production Deployment](#production-deployment)
- [API Documentation](#api-documentation)
- [Development](#development)
- [Monitoring & Operations](#monitoring--operations)
- [Troubleshooting](#troubleshooting)
- [Contributing](#contributing)
## Supported Languages
@@ -22,68 +43,135 @@ A mobile-friendly web application that translates spoken language between multip
- Turkish
- Uzbek
## Setup Instructions
## Quick Start
1. Install the required Python packages:
```
```bash
# Clone the repository
git clone https://github.com/yourusername/talk2me.git
cd talk2me
# Install dependencies
pip install -r requirements.txt
npm install
# Initialize secure configuration
python manage_secrets.py init
python manage_secrets.py set TTS_API_KEY your-api-key-here
# Ensure Ollama is running with Gemma
ollama pull gemma2:9b
ollama pull gemma3:27b
# Start the application
python app.py
```
Open your browser and navigate to `http://localhost:5005`
## Installation
### Prerequisites
- Python 3.8+
- Node.js 14+
- Ollama (for LLM translation)
- OpenAI Edge TTS server
- Optional: NVIDIA GPU with CUDA, AMD GPU with ROCm, or Apple Silicon
### Detailed Setup
1. **Install Python dependencies**:
```bash
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
```
2. Configure secrets and environment:
2. **Install Node.js dependencies**:
```bash
# Initialize secure secrets management
python manage_secrets.py init
# Set required secrets
python manage_secrets.py set TTS_API_KEY
# Or use traditional .env file
cp .env.example .env
nano .env
npm install
npm run build # Build TypeScript files
```
**⚠️ Security Note**: Talk2Me includes encrypted secrets management. See [SECURITY.md](SECURITY.md) and [SECRETS_MANAGEMENT.md](SECRETS_MANAGEMENT.md) for details.
3. **Configure GPU Support** (Optional):
```bash
# For NVIDIA GPUs
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
3. Make sure you have Ollama installed and the Gemma 3 model loaded:
```
ollama pull gemma3
# For AMD GPUs (ROCm)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2
# For Apple Silicon
pip install torch torchvision torchaudio
```
4. Ensure your OpenAI Edge TTS server is running on port 5050.
4. **Set up Ollama**:
```bash
# Install Ollama (https://ollama.ai)
curl -fsSL https://ollama.ai/install.sh | sh
5. Run the application:
```
python app.py
# Pull required models
ollama pull gemma2:9b # Faster, for streaming
ollama pull gemma3:27b # Better quality
```
6. Open your browser and navigate to:
```
http://localhost:8000
```
5. **Configure TTS Server**:
Ensure your OpenAI Edge TTS server is running. Default expected at `http://localhost:5050`
## Usage
## Configuration
1. Select your source language from the dropdown menu
2. Press the microphone button and speak
3. Press the button again to stop recording
4. Wait for the transcription to complete
5. Select your target language
6. Press the "Translate" button
7. Use the play buttons to hear the original or translated text
### Environment Variables
## Technical Details
Talk2Me uses encrypted secrets management for sensitive configuration. You can use either the secure secrets system or traditional environment variables.
- The app uses Flask for the web server
- Audio is processed client-side using the MediaRecorder API
- Whisper for speech recognition with language hints
- Ollama provides access to the Gemma 3 model for translation
- OpenAI Edge TTS delivers natural-sounding speech output
#### Using Secure Secrets Management (Recommended)
## CORS Configuration
```bash
# Initialize the secrets system
python manage_secrets.py init
The application supports Cross-Origin Resource Sharing (CORS) for secure cross-origin usage. See [CORS_CONFIG.md](CORS_CONFIG.md) for detailed configuration instructions.
# Set required secrets
python manage_secrets.py set TTS_API_KEY
python manage_secrets.py set TTS_SERVER_URL
python manage_secrets.py set ADMIN_TOKEN
# List all secrets
python manage_secrets.py list
# Rotate encryption keys
python manage_secrets.py rotate
```
#### Using Environment Variables
Create a `.env` file:
```env
# Core Configuration
TTS_API_KEY=your-api-key-here
TTS_SERVER_URL=http://localhost:5050/v1/audio/speech
ADMIN_TOKEN=your-secure-admin-token
# CORS Configuration
CORS_ORIGINS=https://yourdomain.com,https://app.yourdomain.com
ADMIN_CORS_ORIGINS=https://admin.yourdomain.com
# Security Settings
SECRET_KEY=your-secret-key-here
MAX_CONTENT_LENGTH=52428800 # 50MB
SESSION_LIFETIME=3600 # 1 hour
RATE_LIMIT_STORAGE_URL=redis://localhost:6379/0
# Performance Tuning
WHISPER_MODEL_SIZE=base
GPU_MEMORY_THRESHOLD_MB=2048
MEMORY_CLEANUP_INTERVAL=30
```
### Advanced Configuration
#### CORS Settings
Quick setup:
```bash
# Development (allow all origins)
export CORS_ORIGINS="*"
@@ -93,88 +181,638 @@ export CORS_ORIGINS="https://yourdomain.com,https://app.yourdomain.com"
export ADMIN_CORS_ORIGINS="https://admin.yourdomain.com"
```
## Connection Retry & Offline Support
#### Rate Limiting
Talk2Me handles network interruptions gracefully with automatic retry logic:
- Automatic request queuing during connection loss
- Exponential backoff retry with configurable parameters
- Visual connection status indicators
- Priority-based request processing
Configure per-endpoint rate limits:
See [CONNECTION_RETRY.md](CONNECTION_RETRY.md) for detailed documentation.
```python
# In your config or via admin API
RATE_LIMITS = {
'default': {'requests_per_minute': 30, 'requests_per_hour': 500},
'transcribe': {'requests_per_minute': 10, 'requests_per_hour': 100},
'translate': {'requests_per_minute': 20, 'requests_per_hour': 300}
}
```
## Rate Limiting
#### Session Management
Comprehensive rate limiting protects against DoS attacks and resource exhaustion:
```python
SESSION_CONFIG = {
'max_file_size_mb': 100,
'max_files_per_session': 100,
'idle_timeout_minutes': 15,
'max_lifetime_minutes': 60
}
```
## Security Features
### 1. Rate Limiting
Comprehensive DoS protection with:
- Token bucket algorithm with sliding window
- Per-endpoint configurable limits
- Automatic IP blocking for abusive clients
- Global request limits and concurrent request throttling
- Request size validation
See [RATE_LIMITING.md](RATE_LIMITING.md) for detailed documentation.
```bash
# Check rate limit status
curl -H "X-Admin-Token: $ADMIN_TOKEN" http://localhost:5005/admin/rate-limits
## Session Management
# Block an IP
curl -X POST -H "X-Admin-Token: $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"ip": "192.168.1.100", "duration": 3600}' \
http://localhost:5005/admin/block-ip
```
Advanced session management prevents resource leaks from abandoned sessions:
- Automatic tracking of all session resources (audio files, temp files)
- Per-session resource limits (100 files, 100MB)
- Automatic cleanup of idle sessions (15 minutes) and expired sessions (1 hour)
- Real-time monitoring and metrics
- Manual cleanup capabilities for administrators
### 2. Secrets Management
See [SESSION_MANAGEMENT.md](SESSION_MANAGEMENT.md) for detailed documentation.
- AES-128 encryption for sensitive data
- Automatic key rotation
- Audit logging
- Platform-specific secure storage
## Request Size Limits
```bash
# View audit log
python manage_secrets.py audit
Comprehensive request size limiting prevents memory exhaustion:
- Global limit: 50MB for any request
- Audio files: 25MB maximum
- JSON payloads: 1MB maximum
- File type detection and enforcement
- Dynamic configuration via admin API
# Backup secrets
python manage_secrets.py export --output backup.enc
See [REQUEST_SIZE_LIMITS.md](REQUEST_SIZE_LIMITS.md) for detailed documentation.
# Restore from backup
python manage_secrets.py import --input backup.enc
```
## Error Logging
### 3. Session Management
Production-ready error logging system for debugging and monitoring:
- Structured JSON logs for easy parsing
- Multiple log streams (app, errors, access, security, performance)
- Automatic log rotation to prevent disk exhaustion
- Request tracing with unique IDs
- Performance metrics and slow request tracking
- Admin endpoints for log analysis
- Automatic resource tracking
- Per-session limits (100 files, 100MB)
- Idle session cleanup (15 minutes)
- Real-time monitoring
See [ERROR_LOGGING.md](ERROR_LOGGING.md) for detailed documentation.
```bash
# View active sessions
curl -H "X-Admin-Token: $ADMIN_TOKEN" http://localhost:5005/admin/sessions
## Memory Management
# Clean up specific session
curl -X POST -H "X-Admin-Token: $ADMIN_TOKEN" \
http://localhost:5005/admin/sessions/SESSION_ID/cleanup
```
Comprehensive memory leak prevention for extended use:
- GPU memory management with automatic cleanup
- Whisper model reloading to prevent fragmentation
- Frontend resource tracking (audio blobs, contexts, streams)
- Automatic cleanup of temporary files
- Memory monitoring and manual cleanup endpoints
### 4. Request Size Limits
See [MEMORY_MANAGEMENT.md](MEMORY_MANAGEMENT.md) for detailed documentation.
- Global limit: 50MB
- Audio files: 25MB
- JSON payloads: 1MB
- Dynamic configuration
```bash
# Update size limits
curl -X POST -H "X-Admin-Token: $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"max_audio_size": "30MB"}' \
http://localhost:5005/admin/size-limits
```
## Production Deployment
For production use, deploy with a proper WSGI server:
- Gunicorn with optimized worker configuration
- Nginx reverse proxy with caching
- Docker/Docker Compose support
- Systemd service management
- Comprehensive security hardening
### Docker Deployment
Quick start:
```bash
# Build and run with Docker Compose (CPU only)
docker-compose up -d
# With NVIDIA GPU support
docker-compose -f docker-compose.yml -f docker-compose.nvidia.yml up -d
# With AMD GPU support (ROCm)
docker-compose -f docker-compose.yml -f docker-compose.amd.yml up -d
# With Apple Silicon support
docker-compose -f docker-compose.yml -f docker-compose.apple.yml up -d
# Scale web workers
docker-compose up -d --scale talk2me=4
# View logs
docker-compose logs -f talk2me
```
See [PRODUCTION_DEPLOYMENT.md](PRODUCTION_DEPLOYMENT.md) for detailed deployment instructions.
### Docker Compose Configuration
## Mobile Support
Choose the appropriate configuration based on your GPU:
The interface is fully responsive and designed to work well on mobile devices.
#### NVIDIA GPU Configuration
```yaml
version: '3.8'
services:
web:
build: .
ports:
- "5005:5005"
environment:
- GUNICORN_WORKERS=4
- GUNICORN_THREADS=2
volumes:
- ./logs:/app/logs
- whisper-cache:/root/.cache/whisper
deploy:
resources:
limits:
memory: 4G
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
```
#### AMD GPU Configuration (ROCm)
```yaml
version: '3.8'
services:
web:
build: .
ports:
- "5005:5005"
environment:
- GUNICORN_WORKERS=4
- GUNICORN_THREADS=2
- HSA_OVERRIDE_GFX_VERSION=10.3.0 # Adjust for your GPU
volumes:
- ./logs:/app/logs
- whisper-cache:/root/.cache/whisper
- /dev/kfd:/dev/kfd # ROCm KFD interface
- /dev/dri:/dev/dri # Direct Rendering Interface
devices:
- /dev/kfd
- /dev/dri
group_add:
- video
- render
deploy:
resources:
limits:
memory: 4G
```
#### Apple Silicon Configuration
```yaml
version: '3.8'
services:
web:
build: .
platform: linux/arm64/v8 # For M1/M2 Macs
ports:
- "5005:5005"
environment:
- GUNICORN_WORKERS=4
- GUNICORN_THREADS=2
- PYTORCH_ENABLE_MPS_FALLBACK=1 # Enable MPS fallback
volumes:
- ./logs:/app/logs
- whisper-cache:/root/.cache/whisper
deploy:
resources:
limits:
memory: 4G
```
#### CPU-Only Configuration
```yaml
version: '3.8'
services:
web:
build: .
ports:
- "5005:5005"
environment:
- GUNICORN_WORKERS=4
- GUNICORN_THREADS=2
- OMP_NUM_THREADS=4 # OpenMP threads for CPU
volumes:
- ./logs:/app/logs
- whisper-cache:/root/.cache/whisper
deploy:
resources:
limits:
memory: 4G
cpus: '4.0'
```
### Nginx Configuration
```nginx
upstream talk2me {
least_conn;
server web1:5005 weight=1 max_fails=3 fail_timeout=30s;
server web2:5005 weight=1 max_fails=3 fail_timeout=30s;
}
server {
listen 443 ssl http2;
server_name talk2me.yourdomain.com;
ssl_certificate /etc/ssl/certs/talk2me.crt;
ssl_certificate_key /etc/ssl/private/talk2me.key;
client_max_body_size 50M;
location / {
proxy_pass http://talk2me;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $host;
# WebSocket support
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
# Cache static assets
location /static/ {
alias /app/static/;
expires 30d;
add_header Cache-Control "public, immutable";
}
}
```
### Systemd Service
```ini
[Unit]
Description=Talk2Me Translation Service
After=network.target
[Service]
Type=notify
User=talk2me
Group=talk2me
WorkingDirectory=/opt/talk2me
Environment="PATH=/opt/talk2me/venv/bin"
ExecStart=/opt/talk2me/venv/bin/gunicorn \
--config gunicorn_config.py \
--bind 0.0.0.0:5005 \
app:app
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
```
## API Documentation
### Core Endpoints
#### Transcribe Audio
```http
POST /transcribe
Content-Type: multipart/form-data
audio: (binary)
source_lang: auto|language_code
```
#### Translate Text
```http
POST /translate
Content-Type: application/json
{
"text": "Hello world",
"source_lang": "English",
"target_lang": "Spanish"
}
```
#### Streaming Translation
```http
POST /translate/stream
Content-Type: application/json
{
"text": "Long text to translate",
"source_lang": "auto",
"target_lang": "French"
}
Response: Server-Sent Events stream
```
#### Text-to-Speech
```http
POST /speak
Content-Type: application/json
{
"text": "Hola mundo",
"language": "Spanish"
}
```
### Admin Endpoints
All admin endpoints require `X-Admin-Token` header.
#### Health & Monitoring
- `GET /health` - Basic health check
- `GET /health/detailed` - Component status
- `GET /metrics` - Prometheus metrics
- `GET /admin/memory` - Memory usage stats
#### Session Management
- `GET /admin/sessions` - List active sessions
- `GET /admin/sessions/:id` - Session details
- `POST /admin/sessions/:id/cleanup` - Manual cleanup
#### Security Controls
- `GET /admin/rate-limits` - View rate limits
- `POST /admin/block-ip` - Block IP address
- `GET /admin/logs/security` - Security events
## Development
### TypeScript Development
```bash
# Install dependencies
npm install
# Development mode with auto-compilation
npm run dev
# Build for production
npm run build
# Type checking
npm run typecheck
```
### Project Structure
```
talk2me/
├── app.py # Main Flask application
├── config.py # Configuration management
├── requirements.txt # Python dependencies
├── package.json # Node.js dependencies
├── tsconfig.json # TypeScript configuration
├── gunicorn_config.py # Production server config
├── docker-compose.yml # Container orchestration
├── static/
│ ├── js/
│ │ ├── src/ # TypeScript source files
│ │ └── dist/ # Compiled JavaScript
│ ├── css/ # Stylesheets
│ └── icons/ # PWA icons
├── templates/ # HTML templates
├── logs/ # Application logs
└── tests/ # Test suite
```
### Key Components
1. **Connection Management** (`connectionManager.ts`)
- Automatic retry with exponential backoff
- Request queuing during offline periods
- Connection status monitoring
2. **Translation Cache** (`translationCache.ts`)
- IndexedDB for offline support
- LRU eviction policy
- Automatic cache size management
3. **Speaker Management** (`speakerManager.ts`)
- Multi-speaker conversation tracking
- Speaker-specific audio handling
- Conversation export functionality
4. **Error Handling** (`errorBoundary.ts`)
- Global error catching
- Automatic error reporting
- User-friendly error messages
### Running Tests
```bash
# Python tests
pytest tests/ -v
# TypeScript tests
npm test
# Integration tests
python test_integration.py
```
## Monitoring & Operations
### Logging System
Talk2Me uses structured JSON logging with multiple streams:
```bash
logs/
├── talk2me.log # General application log
├── errors.log # Error-specific log
├── access.log # HTTP access log
├── security.log # Security events
└── performance.log # Performance metrics
```
View logs:
```bash
# Recent errors
tail -f logs/errors.log | jq '.'
# Security events
grep "rate_limit_exceeded" logs/security.log | jq '.'
# Slow requests
jq 'select(.extra_fields.duration_ms > 1000)' logs/performance.log
```
### Memory Management
Talk2Me includes comprehensive memory leak prevention:
1. **Backend Memory Management**
- GPU memory monitoring
- Automatic model reloading
- Temporary file cleanup
2. **Frontend Memory Management**
- Audio blob cleanup
- WebRTC resource management
- Event listener cleanup
Monitor memory:
```bash
# Check memory stats
curl -H "X-Admin-Token: $ADMIN_TOKEN" http://localhost:5005/admin/memory
# Trigger manual cleanup
curl -X POST -H "X-Admin-Token: $ADMIN_TOKEN" \
http://localhost:5005/admin/memory/cleanup
```
### Performance Tuning
#### GPU Optimization
```python
# config.py or environment
GPU_OPTIMIZATIONS = {
'enabled': True,
'fp16': True, # Half precision for 2x speedup
'batch_size': 1, # Adjust based on GPU memory
'num_workers': 2, # Parallel data loading
'pin_memory': True # Faster GPU transfer
}
```
#### Whisper Optimization
```python
TRANSCRIBE_OPTIONS = {
'beam_size': 1, # Faster inference
'best_of': 1, # Disable multiple attempts
'temperature': 0, # Deterministic output
'compression_ratio_threshold': 2.4,
'logprob_threshold': -1.0,
'no_speech_threshold': 0.6
}
```
### Scaling Considerations
1. **Horizontal Scaling**
- Use Redis for shared rate limiting
- Configure sticky sessions for WebSocket
- Share audio files via object storage
2. **Vertical Scaling**
- Increase worker processes
- Tune thread pool size
- Allocate more GPU memory
3. **Caching Strategy**
- Cache translations in Redis
- Use CDN for static assets
- Enable HTTP caching headers
## Troubleshooting
### Common Issues
#### GPU Not Detected
```bash
# Check CUDA availability
python -c "import torch; print(torch.cuda.is_available())"
# Check GPU memory
nvidia-smi
# For AMD GPUs
rocm-smi
# For Apple Silicon
python -c "import torch; print(torch.backends.mps.is_available())"
```
#### High Memory Usage
```bash
# Check for memory leaks
curl -H "X-Admin-Token: $ADMIN_TOKEN" http://localhost:5005/health/storage
# Manual cleanup
curl -X POST -H "X-Admin-Token: $ADMIN_TOKEN" \
http://localhost:5005/admin/cleanup
```
#### CORS Issues
```bash
# Test CORS configuration
curl -X OPTIONS http://localhost:5005/api/transcribe \
-H "Origin: https://yourdomain.com" \
-H "Access-Control-Request-Method: POST"
```
#### TTS Server Connection
```bash
# Check TTS server status
curl http://localhost:5005/check_tts_server
# Update TTS configuration
curl -X POST http://localhost:5005/update_tts_config \
-H "Content-Type: application/json" \
-d '{"server_url": "http://localhost:5050/v1/audio/speech", "api_key": "new-key"}'
```
### Debug Mode
Enable debug logging:
```bash
export FLASK_ENV=development
export LOG_LEVEL=DEBUG
python app.py
```
### Performance Profiling
```bash
# Enable performance logging
export ENABLE_PROFILING=true
# View slow requests
jq 'select(.duration_ms > 1000)' logs/performance.log
```
## Contributing
We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details.
### Development Setup
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Make your changes
4. Run tests (`pytest && npm test`)
5. Commit your changes (`git commit -m 'Add amazing feature'`)
6. Push to the branch (`git push origin feature/amazing-feature`)
7. Open a Pull Request
### Code Style
- Python: Follow PEP 8
- TypeScript: Use ESLint configuration
- Commit messages: Use conventional commits
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Acknowledgments
- OpenAI Whisper team for the amazing speech recognition model
- Ollama team for making LLMs accessible
- All contributors who have helped improve Talk2Me
## Support
- **Documentation**: Full docs at [docs.talk2me.app](https://docs.talk2me.app)
- **Issues**: [GitHub Issues](https://github.com/yourusername/talk2me/issues)
- **Discussions**: [GitHub Discussions](https://github.com/yourusername/talk2me/discussions)
- **Security**: Please report security vulnerabilities to security@talk2me.app

View File

@@ -1,54 +0,0 @@
# TypeScript Setup for Talk2Me
This project now includes TypeScript support for better type safety and developer experience.
## Installation
1. Install Node.js dependencies:
```bash
npm install
```
2. Build TypeScript files:
```bash
npm run build
```
## Development
For development with automatic recompilation:
```bash
npm run watch
# or
npm run dev
```
## Project Structure
- `/static/js/src/` - TypeScript source files
- `app.ts` - Main application logic
- `types.ts` - Type definitions
- `/static/js/dist/` - Compiled JavaScript files (git-ignored)
- `tsconfig.json` - TypeScript configuration
- `package.json` - Node.js dependencies and scripts
## Available Scripts
- `npm run build` - Compile TypeScript to JavaScript
- `npm run watch` - Watch for changes and recompile
- `npm run dev` - Same as watch
- `npm run clean` - Remove compiled files
- `npm run type-check` - Type-check without compiling
## Type Safety Benefits
The TypeScript implementation provides:
- Compile-time type checking
- Better IDE support with autocomplete
- Explicit interface definitions for API responses
- Safer refactoring
- Self-documenting code
## Next Steps
After building, the compiled JavaScript will be in `/static/js/dist/app.js` and will be automatically loaded by the HTML template.

View File

@@ -1,332 +0,0 @@
# Request Size Limits Documentation
This document describes the request size limiting system implemented in Talk2Me to prevent memory exhaustion from large uploads.
## Overview
Talk2Me implements comprehensive request size limiting to protect against:
- Memory exhaustion from large file uploads
- Denial of Service (DoS) attacks using oversized requests
- Buffer overflow attempts
- Resource starvation from unbounded requests
## Default Limits
### Global Limits
- **Maximum Content Length**: 50MB - Absolute maximum for any request
- **Maximum Audio File Size**: 25MB - For audio uploads (transcription)
- **Maximum JSON Payload**: 1MB - For API requests
- **Maximum Image Size**: 10MB - For future image processing features
- **Maximum Chunk Size**: 1MB - For streaming uploads
## Features
### 1. Multi-Layer Protection
The system implements multiple layers of size checking:
- Flask's built-in `MAX_CONTENT_LENGTH` configuration
- Pre-request validation before data is loaded into memory
- File-type specific limits
- Endpoint-specific limits
- Streaming request monitoring
### 2. File Type Detection
Automatic detection and enforcement based on file extensions:
- Audio files: `.wav`, `.mp3`, `.ogg`, `.webm`, `.m4a`, `.flac`, `.aac`
- Image files: `.jpg`, `.jpeg`, `.png`, `.gif`, `.webp`, `.bmp`
- JSON payloads: Content-Type header detection
### 3. Graceful Error Handling
When limits are exceeded:
- Returns 413 (Request Entity Too Large) status code
- Provides clear error messages with size information
- Includes both actual and allowed sizes
- Human-readable size formatting
## Configuration
### Environment Variables
```bash
# Set limits via environment variables (in bytes)
export MAX_CONTENT_LENGTH=52428800 # 50MB
export MAX_AUDIO_SIZE=26214400 # 25MB
export MAX_JSON_SIZE=1048576 # 1MB
export MAX_IMAGE_SIZE=10485760 # 10MB
```
### Flask Configuration
```python
# In config.py or app.py
app.config.update({
'MAX_CONTENT_LENGTH': 50 * 1024 * 1024, # 50MB
'MAX_AUDIO_SIZE': 25 * 1024 * 1024, # 25MB
'MAX_JSON_SIZE': 1 * 1024 * 1024, # 1MB
'MAX_IMAGE_SIZE': 10 * 1024 * 1024 # 10MB
})
```
### Dynamic Configuration
Size limits can be updated at runtime via admin API.
## API Endpoints
### GET /admin/size-limits
Get current size limits.
```bash
curl -H "X-Admin-Token: your-token" http://localhost:5005/admin/size-limits
```
Response:
```json
{
"limits": {
"max_content_length": 52428800,
"max_audio_size": 26214400,
"max_json_size": 1048576,
"max_image_size": 10485760
},
"limits_human": {
"max_content_length": "50.0MB",
"max_audio_size": "25.0MB",
"max_json_size": "1.0MB",
"max_image_size": "10.0MB"
}
}
```
### POST /admin/size-limits
Update size limits dynamically.
```bash
curl -X POST -H "X-Admin-Token: your-token" \
-H "Content-Type: application/json" \
-d '{"max_audio_size": "30MB", "max_json_size": 2097152}' \
http://localhost:5005/admin/size-limits
```
Response:
```json
{
"success": true,
"old_limits": {...},
"new_limits": {...},
"new_limits_human": {
"max_audio_size": "30.0MB",
"max_json_size": "2.0MB"
}
}
```
## Usage Examples
### 1. Endpoint-Specific Limits
```python
@app.route('/upload')
@limit_request_size(max_size=10*1024*1024) # 10MB limit
def upload():
# Handle upload
pass
@app.route('/upload-audio')
@limit_request_size(max_audio_size=30*1024*1024) # 30MB for audio
def upload_audio():
# Handle audio upload
pass
```
### 2. Client-Side Validation
```javascript
// Check file size before upload
const MAX_AUDIO_SIZE = 25 * 1024 * 1024; // 25MB
function validateAudioFile(file) {
if (file.size > MAX_AUDIO_SIZE) {
alert(`Audio file too large. Maximum size is ${MAX_AUDIO_SIZE / 1024 / 1024}MB`);
return false;
}
return true;
}
```
### 3. Chunked Uploads (Future Enhancement)
```javascript
// For files larger than limits, use chunked upload
async function uploadLargeFile(file, chunkSize = 1024 * 1024) {
const chunks = Math.ceil(file.size / chunkSize);
for (let i = 0; i < chunks; i++) {
const start = i * chunkSize;
const end = Math.min(start + chunkSize, file.size);
const chunk = file.slice(start, end);
await uploadChunk(chunk, i, chunks);
}
}
```
## Error Responses
### 413 Request Entity Too Large
When a request exceeds size limits:
```json
{
"error": "Request too large",
"max_size": 52428800,
"your_size": 75000000,
"max_size_mb": 50.0
}
```
### File-Specific Errors
For audio files:
```json
{
"error": "Audio file too large",
"max_size": 26214400,
"your_size": 35000000,
"max_size_mb": 25.0
}
```
For JSON payloads:
```json
{
"error": "JSON payload too large",
"max_size": 1048576,
"your_size": 2000000,
"max_size_kb": 1024.0
}
```
## Best Practices
### 1. Client-Side Validation
Always validate file sizes on the client side:
```javascript
// Add to static/js/app.js
const SIZE_LIMITS = {
audio: 25 * 1024 * 1024, // 25MB
json: 1 * 1024 * 1024, // 1MB
};
function checkFileSize(file, type) {
const limit = SIZE_LIMITS[type];
if (file.size > limit) {
showError(`File too large. Maximum size: ${formatSize(limit)}`);
return false;
}
return true;
}
```
### 2. Progressive Enhancement
For better UX with large files:
- Show upload progress
- Implement resumable uploads
- Compress audio client-side when possible
- Use appropriate audio formats (WebM/Opus for smaller sizes)
### 3. Server Configuration
Configure your web server (Nginx/Apache) to also enforce limits:
**Nginx:**
```nginx
client_max_body_size 50M;
client_body_buffer_size 1M;
```
**Apache:**
```apache
LimitRequestBody 52428800
```
### 4. Monitoring
Monitor size limit violations:
- Track 413 errors in logs
- Alert on repeated violations from same IP
- Adjust limits based on usage patterns
## Security Considerations
1. **Memory Protection**: Pre-flight size checks prevent loading large files into memory
2. **DoS Prevention**: Limits prevent attackers from exhausting server resources
3. **Bandwidth Protection**: Prevents bandwidth exhaustion from large uploads
4. **Storage Protection**: Works with session management to limit total storage per user
## Integration with Other Systems
### Rate Limiting
Size limits work in conjunction with rate limiting:
- Large requests count more against rate limits
- Repeated size violations can trigger IP blocking
### Session Management
Size limits are enforced per session:
- Total storage per session is limited
- Large files count against session resource limits
### Monitoring
Size limit violations are tracked in:
- Application logs
- Health check endpoints
- Admin monitoring dashboards
## Troubleshooting
### Common Issues
#### 1. Legitimate Large Files Rejected
If users need to upload larger files:
```bash
# Increase limit for audio files to 50MB
curl -X POST -H "X-Admin-Token: token" \
-d '{"max_audio_size": "50MB"}' \
http://localhost:5005/admin/size-limits
```
#### 2. Chunked Transfer Encoding
For requests without Content-Length header:
- The system monitors the stream
- Terminates connection if size exceeded
- May require special handling for some clients
#### 3. Load Balancer Limits
Ensure your load balancer also enforces appropriate limits:
- AWS ALB: Configure request size limits
- Cloudflare: Set upload size limits
- Nginx: Configure client_max_body_size
## Performance Impact
The size limiting system has minimal performance impact:
- Pre-flight checks are O(1) operations
- No buffering of large requests
- Early termination of oversized requests
- Efficient memory usage
## Future Enhancements
1. **Chunked Upload Support**: Native support for resumable uploads
2. **Compression Detection**: Automatic handling of compressed uploads
3. **Dynamic Limits**: Per-user or per-tier size limits
4. **Bandwidth Throttling**: Rate limit large uploads
5. **Storage Quotas**: Long-term storage limits per user

View File

@@ -1,411 +0,0 @@
# Secrets Management Documentation
This document describes the secure secrets management system implemented in Talk2Me.
## Overview
Talk2Me uses a comprehensive secrets management system that provides:
- Encrypted storage of sensitive configuration
- Secret rotation capabilities
- Audit logging
- Integrity verification
- CLI management tools
- Environment variable integration
## Architecture
### Components
1. **SecretsManager** (`secrets_manager.py`)
- Handles encryption/decryption using Fernet (AES-128)
- Manages secret lifecycle (create, read, update, delete)
- Provides audit logging
- Supports secret rotation
2. **Configuration System** (`config.py`)
- Integrates secrets with Flask configuration
- Environment-specific configurations
- Validation and sanitization
3. **CLI Tool** (`manage_secrets.py`)
- Command-line interface for secret management
- Interactive and scriptable
### Security Features
- **Encryption**: AES-128 encryption using cryptography.fernet
- **Key Derivation**: PBKDF2 with SHA256 (100,000 iterations)
- **Master Key**: Stored separately with restricted permissions
- **Audit Trail**: All access and modifications logged
- **Integrity Checks**: Verify secrets haven't been tampered with
## Quick Start
### 1. Initialize Secrets
```bash
python manage_secrets.py init
```
This will:
- Generate a master encryption key
- Create initial secrets (Flask secret key, admin token)
- Prompt for required secrets (TTS API key)
### 2. Set a Secret
```bash
# Interactive (hidden input)
python manage_secrets.py set TTS_API_KEY
# Direct (be careful with shell history)
python manage_secrets.py set TTS_API_KEY --value "your-api-key"
# With metadata
python manage_secrets.py set API_KEY --value "key" --metadata '{"service": "external-api"}'
```
### 3. List Secrets
```bash
python manage_secrets.py list
```
Output:
```
Key Created Last Rotated Has Value
-------------------------------------------------------------------------------------
FLASK_SECRET_KEY 2024-01-15 2024-01-20 ✓
TTS_API_KEY 2024-01-15 Never ✓
ADMIN_TOKEN 2024-01-15 2024-01-18 ✓
```
### 4. Rotate Secrets
```bash
# Rotate a specific secret
python manage_secrets.py rotate ADMIN_TOKEN
# Check which secrets need rotation
python manage_secrets.py check-rotation
# Schedule automatic rotation
python manage_secrets.py schedule-rotation API_KEY 30 # Every 30 days
```
## Configuration
### Environment Variables
The secrets manager checks these locations in order:
1. Encrypted secrets storage (`.secrets.json`)
2. `SECRET_<KEY>` environment variable
3. `<KEY>` environment variable
4. Default value
### Master Key
The master encryption key is loaded from:
1. `MASTER_KEY` environment variable
2. `.master_key` file (default)
3. Auto-generated if neither exists
**Important**: Protect the master key!
- Set file permissions: `chmod 600 .master_key`
- Back it up securely
- Never commit to version control
### Flask Integration
Secrets are automatically loaded into Flask configuration:
```python
# In app.py
from config import init_app as init_config
from secrets_manager import init_app as init_secrets
app = Flask(__name__)
init_config(app)
init_secrets(app)
# Access secrets
api_key = app.config['TTS_API_KEY']
```
## CLI Commands
### Basic Operations
```bash
# List all secrets
python manage_secrets.py list
# Get a secret value (requires confirmation)
python manage_secrets.py get TTS_API_KEY
# Set a secret
python manage_secrets.py set DATABASE_URL
# Delete a secret
python manage_secrets.py delete OLD_API_KEY
# Rotate a secret
python manage_secrets.py rotate ADMIN_TOKEN
```
### Advanced Operations
```bash
# Verify integrity of all secrets
python manage_secrets.py verify
# Migrate from environment variables
python manage_secrets.py migrate
# View audit log
python manage_secrets.py audit
python manage_secrets.py audit TTS_API_KEY --limit 50
# Schedule rotation
python manage_secrets.py schedule-rotation API_KEY 90
```
## Security Best Practices
### 1. File Permissions
```bash
# Secure the secrets files
chmod 600 .secrets.json
chmod 600 .master_key
```
### 2. Backup Strategy
- Back up `.master_key` separately from `.secrets.json`
- Store backups in different secure locations
- Test restore procedures regularly
### 3. Rotation Policy
Recommended rotation intervals:
- API Keys: 90 days
- Admin Tokens: 30 days
- Database Passwords: 180 days
- Encryption Keys: 365 days
### 4. Access Control
- Use environment-specific secrets
- Implement least privilege access
- Audit secret access regularly
### 5. Git Security
Ensure these files are in `.gitignore`:
```
.secrets.json
.master_key
secrets.db
*.key
```
## Deployment
### Development
```bash
# Use .env file for convenience
cp .env.example .env
# Edit .env with development values
# Initialize secrets
python manage_secrets.py init
```
### Production
```bash
# Set master key via environment
export MASTER_KEY="your-production-master-key"
# Or use a key management service
export MASTER_KEY_FILE="/secure/path/to/master.key"
# Load secrets from secure storage
python manage_secrets.py set TTS_API_KEY --value "$TTS_API_KEY"
python manage_secrets.py set ADMIN_TOKEN --value "$ADMIN_TOKEN"
```
### Docker
```dockerfile
# Dockerfile
FROM python:3.9
# Copy encrypted secrets (not the master key!)
COPY .secrets.json /app/.secrets.json
# Master key provided at runtime
ENV MASTER_KEY=""
# Run with:
# docker run -e MASTER_KEY="$MASTER_KEY" myapp
```
### Kubernetes
```yaml
# secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: talk2me-master-key
type: Opaque
stringData:
master-key: "your-master-key"
---
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
spec:
template:
spec:
containers:
- name: talk2me
env:
- name: MASTER_KEY
valueFrom:
secretKeyRef:
name: talk2me-master-key
key: master-key
```
## Troubleshooting
### Lost Master Key
If you lose the master key:
1. You'll need to recreate all secrets
2. Generate new master key: `python manage_secrets.py init`
3. Re-enter all secret values
### Corrupted Secrets File
```bash
# Check integrity
python manage_secrets.py verify
# If corrupted, restore from backup or reinitialize
```
### Permission Errors
```bash
# Fix file permissions
chmod 600 .secrets.json .master_key
chown $USER:$USER .secrets.json .master_key
```
## Monitoring
### Audit Logs
Review secret access patterns:
```bash
# View all audit entries
python manage_secrets.py audit
# Check specific secret
python manage_secrets.py audit TTS_API_KEY
# Export for analysis
python manage_secrets.py audit > audit.log
```
### Rotation Monitoring
```bash
# Check rotation status
python manage_secrets.py check-rotation
# Set up cron job for automatic checks
0 0 * * * /path/to/python /path/to/manage_secrets.py check-rotation
```
## Migration Guide
### From Environment Variables
```bash
# Automatic migration
python manage_secrets.py migrate
# Manual migration
export OLD_API_KEY="your-key"
python manage_secrets.py set API_KEY --value "$OLD_API_KEY"
unset OLD_API_KEY
```
### From .env Files
```python
# migrate_env.py
from dotenv import dotenv_values
from secrets_manager import get_secrets_manager
env_values = dotenv_values('.env')
manager = get_secrets_manager()
for key, value in env_values.items():
if key.endswith('_KEY') or key.endswith('_TOKEN'):
manager.set(key, value, {'migrated_from': '.env'})
```
## API Reference
### Python API
```python
from secrets_manager import get_secret, set_secret
# Get a secret
api_key = get_secret('TTS_API_KEY', default='')
# Set a secret
set_secret('NEW_API_KEY', 'value', metadata={'service': 'external'})
# Advanced usage
from secrets_manager import get_secrets_manager
manager = get_secrets_manager()
manager.rotate('API_KEY')
manager.schedule_rotation('TOKEN', days=30)
```
### Flask CLI
```bash
# Via Flask CLI
flask secrets-list
flask secrets-set
flask secrets-rotate
flask secrets-check-rotation
```
## Security Considerations
1. **Never log secret values**
2. **Use secure random generation for new secrets**
3. **Implement proper access controls**
4. **Regular security audits**
5. **Incident response plan for compromised secrets**
## Future Enhancements
- Integration with cloud KMS (AWS, Azure, GCP)
- Hardware security module (HSM) support
- Secret sharing (Shamir's Secret Sharing)
- Time-based access controls
- Automated compliance reporting

View File

@@ -1,173 +0,0 @@
# Security Configuration Guide
This document outlines security best practices for deploying Talk2Me.
## Secrets Management
Talk2Me includes a comprehensive secrets management system with encryption, rotation, and audit logging.
### Quick Start
```bash
# Initialize secrets management
python manage_secrets.py init
# Set a secret
python manage_secrets.py set TTS_API_KEY
# List secrets
python manage_secrets.py list
# Rotate secrets
python manage_secrets.py rotate ADMIN_TOKEN
```
See [SECRETS_MANAGEMENT.md](SECRETS_MANAGEMENT.md) for detailed documentation.
## Environment Variables
**NEVER commit sensitive information like API keys, passwords, or secrets to version control.**
### Required Security Configuration
1. **TTS_API_KEY**
- Required for TTS server authentication
- Set via environment variable: `export TTS_API_KEY="your-api-key"`
- Or use a `.env` file (see `.env.example`)
2. **SECRET_KEY**
- Required for Flask session security
- Generate a secure key: `python -c "import secrets; print(secrets.token_hex(32))"`
- Set via: `export SECRET_KEY="your-generated-key"`
3. **ADMIN_TOKEN**
- Required for admin endpoints
- Generate a secure token: `python -c "import secrets; print(secrets.token_urlsafe(32))"`
- Set via: `export ADMIN_TOKEN="your-admin-token"`
### Using a .env File (Recommended)
1. Copy the example file:
```bash
cp .env.example .env
```
2. Edit `.env` with your actual values:
```bash
nano .env # or your preferred editor
```
3. Load environment variables:
```bash
# Using python-dotenv (add to requirements.txt)
pip install python-dotenv
# Or source manually
source .env
```
### Python-dotenv Integration
To automatically load `.env` files, add this to the top of `app.py`:
```python
from dotenv import load_dotenv
load_dotenv() # Load .env file if it exists
```
### Production Deployment
For production deployments:
1. **Use a secrets management service**:
- AWS Secrets Manager
- HashiCorp Vault
- Azure Key Vault
- Google Secret Manager
2. **Set environment variables securely**:
- Use your platform's environment configuration
- Never expose secrets in logs or error messages
- Rotate keys regularly
3. **Additional security measures**:
- Use HTTPS only
- Enable CORS restrictions
- Implement rate limiting
- Monitor for suspicious activity
### Docker Deployment
When using Docker:
```dockerfile
# Use build arguments for non-sensitive config
ARG TTS_SERVER_URL=http://localhost:5050/v1/audio/speech
# Use runtime environment for secrets
ENV TTS_API_KEY=""
```
Run with:
```bash
docker run -e TTS_API_KEY="your-key" -e SECRET_KEY="your-secret" talk2me
```
### Kubernetes Deployment
Use Kubernetes secrets:
```yaml
apiVersion: v1
kind: Secret
metadata:
name: talk2me-secrets
type: Opaque
stringData:
tts-api-key: "your-api-key"
flask-secret-key: "your-secret-key"
admin-token: "your-admin-token"
```
### Rate Limiting
Talk2Me implements comprehensive rate limiting to prevent abuse:
1. **Per-Endpoint Limits**:
- Transcription: 10/min, 100/hour
- Translation: 20/min, 300/hour
- TTS: 15/min, 200/hour
2. **Global Limits**:
- 1,000 requests/minute total
- 50 concurrent requests maximum
3. **Automatic Protection**:
- IP blocking for excessive requests
- Request size validation
- Burst control
See [RATE_LIMITING.md](RATE_LIMITING.md) for configuration details.
### Security Checklist
- [ ] All API keys removed from source code
- [ ] Environment variables configured
- [ ] `.env` file added to `.gitignore`
- [ ] Secrets rotated after any potential exposure
- [ ] HTTPS enabled in production
- [ ] CORS properly configured
- [ ] Rate limiting enabled and configured
- [ ] Admin endpoints protected with authentication
- [ ] Error messages don't expose sensitive info
- [ ] Logs sanitized of sensitive data
- [ ] Request size limits enforced
- [ ] IP blocking configured for abuse prevention
### Reporting Security Issues
If you discover a security vulnerability, please report it to:
- Create a private security advisory on GitHub
- Or email: security@yourdomain.com
Do not create public issues for security vulnerabilities.

View File

@@ -1,366 +0,0 @@
# Session Management Documentation
This document describes the session management system implemented in Talk2Me to prevent resource leaks from abandoned sessions.
## Overview
Talk2Me implements a comprehensive session management system that tracks user sessions and associated resources (audio files, temporary files, streams) to ensure proper cleanup and prevent resource exhaustion.
## Features
### 1. Automatic Resource Tracking
All resources created during a user session are automatically tracked:
- Audio files (uploads and generated)
- Temporary files
- Active streams
- Resource metadata (size, creation time, purpose)
### 2. Resource Limits
Per-session limits prevent resource exhaustion:
- Maximum resources per session: 100
- Maximum storage per session: 100MB
- Automatic cleanup of oldest resources when limits are reached
### 3. Session Lifecycle Management
Sessions are automatically managed:
- Created on first request
- Updated on each request
- Cleaned up when idle (15 minutes)
- Removed when expired (1 hour)
### 4. Automatic Cleanup
Background cleanup processes run automatically:
- Idle session cleanup (every minute)
- Expired session cleanup (every minute)
- Orphaned file cleanup (every minute)
## Configuration
Session management can be configured via environment variables or Flask config:
```python
# app.py or config.py
app.config.update({
'MAX_SESSION_DURATION': 3600, # 1 hour
'MAX_SESSION_IDLE_TIME': 900, # 15 minutes
'MAX_RESOURCES_PER_SESSION': 100,
'MAX_BYTES_PER_SESSION': 104857600, # 100MB
'SESSION_CLEANUP_INTERVAL': 60, # 1 minute
'SESSION_STORAGE_PATH': '/path/to/sessions'
})
```
## API Endpoints
### Admin Endpoints
All admin endpoints require authentication via `X-Admin-Token` header.
#### GET /admin/sessions
Get information about all active sessions.
```bash
curl -H "X-Admin-Token: your-token" http://localhost:5005/admin/sessions
```
Response:
```json
{
"sessions": [
{
"session_id": "uuid",
"user_id": null,
"ip_address": "192.168.1.1",
"created_at": "2024-01-15T10:00:00",
"last_activity": "2024-01-15T10:05:00",
"duration_seconds": 300,
"idle_seconds": 0,
"request_count": 5,
"resource_count": 3,
"total_bytes_used": 1048576,
"resources": [...]
}
],
"stats": {
"total_sessions_created": 100,
"total_sessions_cleaned": 50,
"active_sessions": 5,
"avg_session_duration": 600,
"avg_resources_per_session": 4.2
}
}
```
#### GET /admin/sessions/{session_id}
Get detailed information about a specific session.
```bash
curl -H "X-Admin-Token: your-token" http://localhost:5005/admin/sessions/abc123
```
#### POST /admin/sessions/{session_id}/cleanup
Manually cleanup a specific session.
```bash
curl -X POST -H "X-Admin-Token: your-token" \
http://localhost:5005/admin/sessions/abc123/cleanup
```
#### GET /admin/sessions/metrics
Get session management metrics for monitoring.
```bash
curl -H "X-Admin-Token: your-token" http://localhost:5005/admin/sessions/metrics
```
Response:
```json
{
"sessions": {
"active": 5,
"total_created": 100,
"total_cleaned": 95
},
"resources": {
"active": 20,
"total_cleaned": 380,
"active_bytes": 10485760,
"total_bytes_cleaned": 1073741824
},
"limits": {
"max_session_duration": 3600,
"max_idle_time": 900,
"max_resources_per_session": 100,
"max_bytes_per_session": 104857600
}
}
```
## CLI Commands
Session management can be controlled via Flask CLI commands:
```bash
# List all active sessions
flask sessions-list
# Manual cleanup
flask sessions-cleanup
# Show statistics
flask sessions-stats
```
## Usage Examples
### 1. Monitor Active Sessions
```python
import requests
headers = {'X-Admin-Token': 'your-admin-token'}
response = requests.get('http://localhost:5005/admin/sessions', headers=headers)
sessions = response.json()
for session in sessions['sessions']:
print(f"Session {session['session_id']}:")
print(f" IP: {session['ip_address']}")
print(f" Resources: {session['resource_count']}")
print(f" Storage: {session['total_bytes_used'] / 1024 / 1024:.2f} MB")
```
### 2. Cleanup Idle Sessions
```python
# Get all sessions
response = requests.get('http://localhost:5005/admin/sessions', headers=headers)
sessions = response.json()['sessions']
# Find idle sessions
idle_threshold = 300 # 5 minutes
for session in sessions:
if session['idle_seconds'] > idle_threshold:
# Cleanup idle session
cleanup_url = f'http://localhost:5005/admin/sessions/{session["session_id"]}/cleanup'
requests.post(cleanup_url, headers=headers)
print(f"Cleaned up idle session {session['session_id']}")
```
### 3. Monitor Resource Usage
```python
# Get metrics
response = requests.get('http://localhost:5005/admin/sessions/metrics', headers=headers)
metrics = response.json()
print(f"Active sessions: {metrics['sessions']['active']}")
print(f"Active resources: {metrics['resources']['active']}")
print(f"Storage used: {metrics['resources']['active_bytes'] / 1024 / 1024:.2f} MB")
print(f"Total cleaned: {metrics['resources']['total_bytes_cleaned'] / 1024 / 1024 / 1024:.2f} GB")
```
## Resource Types
The session manager tracks different types of resources:
### 1. Audio Files
- Uploaded audio files for transcription
- Generated audio files from TTS
- Automatically cleaned up after session ends
### 2. Temporary Files
- Processing intermediates
- Cache files
- Automatically cleaned up after use
### 3. Streams
- WebSocket connections
- Server-sent event streams
- Closed when session ends
## Best Practices
### 1. Session Configuration
```python
# Development
app.config.update({
'MAX_SESSION_DURATION': 7200, # 2 hours
'MAX_SESSION_IDLE_TIME': 1800, # 30 minutes
'MAX_RESOURCES_PER_SESSION': 200,
'MAX_BYTES_PER_SESSION': 209715200 # 200MB
})
# Production
app.config.update({
'MAX_SESSION_DURATION': 3600, # 1 hour
'MAX_SESSION_IDLE_TIME': 900, # 15 minutes
'MAX_RESOURCES_PER_SESSION': 100,
'MAX_BYTES_PER_SESSION': 104857600 # 100MB
})
```
### 2. Monitoring
Set up monitoring for:
- Number of active sessions
- Resource usage per session
- Cleanup frequency
- Failed cleanup attempts
### 3. Alerting
Configure alerts for:
- High number of active sessions (>1000)
- High resource usage (>80% of limits)
- Failed cleanup operations
- Orphaned files detected
## Troubleshooting
### Common Issues
#### 1. Sessions Not Being Cleaned Up
Check cleanup thread status:
```bash
flask sessions-stats
```
Manual cleanup:
```bash
flask sessions-cleanup
```
#### 2. Resource Limits Reached
Check session details:
```bash
curl -H "X-Admin-Token: token" http://localhost:5005/admin/sessions/SESSION_ID
```
Increase limits if needed:
```python
app.config['MAX_RESOURCES_PER_SESSION'] = 200
app.config['MAX_BYTES_PER_SESSION'] = 209715200 # 200MB
```
#### 3. Orphaned Files
Check for orphaned files:
```bash
ls -la /path/to/session/storage/
```
Clean orphaned files:
```bash
flask sessions-cleanup
```
### Debug Logging
Enable debug logging for session management:
```python
import logging
# Enable session manager debug logs
logging.getLogger('session_manager').setLevel(logging.DEBUG)
```
## Security Considerations
1. **Session Hijacking**: Sessions are tied to IP addresses and user agents
2. **Resource Exhaustion**: Strict per-session limits prevent DoS attacks
3. **File System Access**: Session storage uses secure paths and permissions
4. **Admin Access**: All admin endpoints require authentication
## Performance Impact
The session management system has minimal performance impact:
- Memory: ~1KB per session + resource metadata
- CPU: Background cleanup runs every minute
- Disk I/O: Cleanup operations are batched
- Network: No external dependencies
## Integration with Other Systems
### Rate Limiting
Session management integrates with rate limiting:
```python
# Sessions are automatically tracked per IP
# Rate limits apply per session
```
### Secrets Management
Session tokens can be encrypted:
```python
from secrets_manager import encrypt_value
encrypted_session = encrypt_value(session_id)
```
### Monitoring
Export metrics to monitoring systems:
```python
# Prometheus format
@app.route('/metrics')
def prometheus_metrics():
metrics = app.session_manager.export_metrics()
# Format as Prometheus metrics
return format_prometheus(metrics)
```
## Future Enhancements
1. **Session Persistence**: Store sessions in Redis/database
2. **Distributed Sessions**: Support for multi-server deployments
3. **Session Analytics**: Track usage patterns and trends
4. **Resource Quotas**: Per-user resource quotas
5. **Session Replay**: Debug issues by replaying sessions

19
docker-compose.amd.yml Normal file
View File

@@ -0,0 +1,19 @@
version: '3.8'
# Docker Compose override for AMD GPU support (ROCm)
# Usage: docker-compose -f docker-compose.yml -f docker-compose.amd.yml up
services:
talk2me:
environment:
- HSA_OVERRIDE_GFX_VERSION=10.3.0 # Adjust based on your GPU model
- ROCR_VISIBLE_DEVICES=0 # Use first GPU
volumes:
- /dev/kfd:/dev/kfd # ROCm KFD interface
- /dev/dri:/dev/dri # Direct Rendering Interface
devices:
- /dev/kfd
- /dev/dri
group_add:
- video # Required for GPU access
- render # Required for GPU access

11
docker-compose.apple.yml Normal file
View File

@@ -0,0 +1,11 @@
version: '3.8'
# Docker Compose override for Apple Silicon
# Usage: docker-compose -f docker-compose.yml -f docker-compose.apple.yml up
services:
talk2me:
platform: linux/arm64/v8 # For M1/M2/M3 Macs
environment:
- PYTORCH_ENABLE_MPS_FALLBACK=1 # Enable Metal Performance Shaders fallback
- PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7 # Memory management for MPS

16
docker-compose.nvidia.yml Normal file
View File

@@ -0,0 +1,16 @@
version: '3.8'
# Docker Compose override for NVIDIA GPU support
# Usage: docker-compose -f docker-compose.yml -f docker-compose.nvidia.yml up
services:
talk2me:
environment:
- CUDA_VISIBLE_DEVICES=0 # Use first GPU
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]

View File

@@ -1,776 +0,0 @@
#!/bin/bash
# Create necessary directories
mkdir -p templates static/{css,js}
# Move HTML template to templates directory
cat > templates/index.html << 'EOL'
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Voice Language Translator</title>
<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0-alpha1/dist/css/bootstrap.min.css" rel="stylesheet">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.0.0/css/all.min.css">
<style>
body {
padding-top: 20px;
padding-bottom: 20px;
background-color: #f8f9fa;
}
.record-btn {
width: 80px;
height: 80px;
border-radius: 50%;
display: flex;
align-items: center;
justify-content: center;
font-size: 32px;
margin: 20px auto;
box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);
transition: all 0.3s;
}
.record-btn:active {
transform: scale(0.95);
box-shadow: 0 2px 4px rgba(0, 0, 0, 0.1);
}
.recording {
background-color: #dc3545 !important;
animation: pulse 1.5s infinite;
}
@keyframes pulse {
0% {
transform: scale(1);
}
50% {
transform: scale(1.05);
}
100% {
transform: scale(1);
}
}
.card {
border-radius: 15px;
box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
margin-bottom: 20px;
}
.card-header {
border-radius: 15px 15px 0 0 !important;
}
.language-select {
border-radius: 10px;
padding: 10px;
}
.text-display {
min-height: 100px;
padding: 15px;
background-color: #f8f9fa;
border-radius: 10px;
margin-bottom: 15px;
}
.btn-action {
border-radius: 10px;
padding: 8px 15px;
margin: 5px;
}
.spinner-border {
width: 1rem;
height: 1rem;
margin-right: 5px;
}
.status-indicator {
font-size: 0.9rem;
font-style: italic;
color: #6c757d;
}
</style>
</head>
<body>
<div class="container">
<h1 class="text-center mb-4">Voice Language Translator</h1>
<p class="text-center text-muted">Powered by Gemma 3, Whisper & Edge TTS</p>
<div class="row">
<div class="col-md-6 mb-3">
<div class="card">
<div class="card-header bg-primary text-white">
<h5 class="mb-0">Source</h5>
</div>
<div class="card-body">
<select id="sourceLanguage" class="form-select language-select mb-3">
{% for language in languages %}
<option value="{{ language }}">{{ language }}</option>
{% endfor %}
</select>
<div class="text-display" id="sourceText">
<p class="text-muted">Your transcribed text will appear here...</p>
</div>
<div class="d-flex justify-content-between">
<button id="playSource" class="btn btn-outline-primary btn-action" disabled>
<i class="fas fa-play"></i> Play
</button>
<button id="clearSource" class="btn btn-outline-secondary btn-action">
<i class="fas fa-trash"></i> Clear
</button>
</div>
</div>
</div>
</div>
<div class="col-md-6 mb-3">
<div class="card">
<div class="card-header bg-success text-white">
<h5 class="mb-0">Translation</h5>
</div>
<div class="card-body">
<select id="targetLanguage" class="form-select language-select mb-3">
{% for language in languages %}
<option value="{{ language }}">{{ language }}</option>
{% endfor %}
</select>
<div class="text-display" id="translatedText">
<p class="text-muted">Translation will appear here...</p>
</div>
<div class="d-flex justify-content-between">
<button id="playTranslation" class="btn btn-outline-success btn-action" disabled>
<i class="fas fa-play"></i> Play
</button>
<button id="clearTranslation" class="btn btn-outline-secondary btn-action">
<i class="fas fa-trash"></i> Clear
</button>
</div>
</div>
</div>
</div>
</div>
<div class="text-center">
<button id="recordBtn" class="btn btn-primary record-btn">
<i class="fas fa-microphone"></i>
</button>
<p class="status-indicator" id="statusIndicator">Click to start recording</p>
</div>
<div class="text-center mt-3">
<button id="translateBtn" class="btn btn-success" disabled>
<i class="fas fa-language"></i> Translate
</button>
</div>
<div class="mt-3">
<div class="progress d-none" id="progressContainer">
<div id="progressBar" class="progress-bar progress-bar-striped progress-bar-animated" role="progressbar" style="width: 0%"></div>
</div>
</div>
<audio id="audioPlayer" style="display: none;"></audio>
</div>
<script src="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0-alpha1/dist/js/bootstrap.bundle.min.js"></script>
<script>
document.addEventListener('DOMContentLoaded', function() {
// DOM elements
const recordBtn = document.getElementById('recordBtn');
const translateBtn = document.getElementById('translateBtn');
const sourceText = document.getElementById('sourceText');
const translatedText = document.getElementById('translatedText');
const sourceLanguage = document.getElementById('sourceLanguage');
const targetLanguage = document.getElementById('targetLanguage');
const playSource = document.getElementById('playSource');
const playTranslation = document.getElementById('playTranslation');
const clearSource = document.getElementById('clearSource');
const clearTranslation = document.getElementById('clearTranslation');
const statusIndicator = document.getElementById('statusIndicator');
const progressContainer = document.getElementById('progressContainer');
const progressBar = document.getElementById('progressBar');
const audioPlayer = document.getElementById('audioPlayer');
// Set initial values
let isRecording = false;
let mediaRecorder = null;
let audioChunks = [];
let currentSourceText = '';
let currentTranslationText = '';
// Make sure target language is different from source
if (targetLanguage.options[0].value === sourceLanguage.value) {
targetLanguage.selectedIndex = 1;
}
// Event listeners for language selection
sourceLanguage.addEventListener('change', function() {
if (targetLanguage.value === sourceLanguage.value) {
for (let i = 0; i < targetLanguage.options.length; i++) {
if (targetLanguage.options[i].value !== sourceLanguage.value) {
targetLanguage.selectedIndex = i;
break;
}
}
}
});
targetLanguage.addEventListener('change', function() {
if (targetLanguage.value === sourceLanguage.value) {
for (let i = 0; i < sourceLanguage.options.length; i++) {
if (sourceLanguage.options[i].value !== targetLanguage.value) {
sourceLanguage.selectedIndex = i;
break;
}
}
}
});
// Record button click event
recordBtn.addEventListener('click', function() {
if (isRecording) {
stopRecording();
} else {
startRecording();
}
});
// Function to start recording
function startRecording() {
navigator.mediaDevices.getUserMedia({ audio: true })
.then(stream => {
mediaRecorder = new MediaRecorder(stream);
audioChunks = [];
mediaRecorder.addEventListener('dataavailable', event => {
audioChunks.push(event.data);
});
mediaRecorder.addEventListener('stop', () => {
const audioBlob = new Blob(audioChunks, { type: 'audio/wav' });
transcribeAudio(audioBlob);
});
mediaRecorder.start();
isRecording = true;
recordBtn.classList.add('recording');
recordBtn.classList.replace('btn-primary', 'btn-danger');
recordBtn.innerHTML = '<i class="fas fa-stop"></i>';
statusIndicator.textContent = 'Recording... Click to stop';
})
.catch(error => {
console.error('Error accessing microphone:', error);
alert('Error accessing microphone. Please make sure you have given permission for microphone access.');
});
}
// Function to stop recording
function stopRecording() {
mediaRecorder.stop();
isRecording = false;
recordBtn.classList.remove('recording');
recordBtn.classList.replace('btn-danger', 'btn-primary');
recordBtn.innerHTML = '<i class="fas fa-microphone"></i>';
statusIndicator.textContent = 'Processing audio...';
// Stop all audio tracks
mediaRecorder.stream.getTracks().forEach(track => track.stop());
}
// Function to transcribe audio
function transcribeAudio(audioBlob) {
const formData = new FormData();
formData.append('audio', audioBlob);
formData.append('source_lang', sourceLanguage.value);
showProgress();
fetch('/transcribe', {
method: 'POST',
body: formData
})
.then(response => response.json())
.then(data => {
hideProgress();
if (data.success) {
currentSourceText = data.text;
sourceText.innerHTML = `<p>${data.text}</p>`;
playSource.disabled = false;
translateBtn.disabled = false;
statusIndicator.textContent = 'Transcription complete';
} else {
sourceText.innerHTML = `<p class="text-danger">Error: ${data.error}</p>`;
statusIndicator.textContent = 'Transcription failed';
}
})
.catch(error => {
hideProgress();
console.error('Transcription error:', error);
sourceText.innerHTML = `<p class="text-danger">Failed to transcribe audio. Please try again.</p>`;
statusIndicator.textContent = 'Transcription failed';
});
}
// Translate button click event
translateBtn.addEventListener('click', function() {
if (!currentSourceText) {
return;
}
statusIndicator.textContent = 'Translating...';
showProgress();
fetch('/translate', {
method: 'POST',
headers: {
'Content-Type': 'application/json'
},
body: JSON.stringify({
text: currentSourceText,
source_lang: sourceLanguage.value,
target_lang: targetLanguage.value
})
})
.then(response => response.json())
.then(data => {
hideProgress();
if (data.success) {
currentTranslationText = data.translation;
translatedText.innerHTML = `<p>${data.translation}</p>`;
playTranslation.disabled = false;
statusIndicator.textContent = 'Translation complete';
} else {
translatedText.innerHTML = `<p class="text-danger">Error: ${data.error}</p>`;
statusIndicator.textContent = 'Translation failed';
}
})
.catch(error => {
hideProgress();
console.error('Translation error:', error);
translatedText.innerHTML = `<p class="text-danger">Failed to translate. Please try again.</p>`;
statusIndicator.textContent = 'Translation failed';
});
});
// Play source text
playSource.addEventListener('click', function() {
if (!currentSourceText) return;
playAudio(currentSourceText, sourceLanguage.value);
statusIndicator.textContent = 'Playing source audio...';
});
// Play translation
playTranslation.addEventListener('click', function() {
if (!currentTranslationText) return;
playAudio(currentTranslationText, targetLanguage.value);
statusIndicator.textContent = 'Playing translation audio...';
});
// Function to play audio via TTS
function playAudio(text, language) {
showProgress();
fetch('/speak', {
method: 'POST',
headers: {
'Content-Type': 'application/json'
},
body: JSON.stringify({
text: text,
language: language
})
})
.then(response => response.json())
.then(data => {
hideProgress();
if (data.success) {
audioPlayer.src = data.audio_url;
audioPlayer.onended = function() {
statusIndicator.textContent = 'Ready';
};
audioPlayer.play();
} else {
statusIndicator.textContent = 'TTS failed';
alert('Failed to play audio: ' + data.error);
}
})
.catch(error => {
hideProgress();
console.error('TTS error:', error);
statusIndicator.textContent = 'TTS failed';
});
}
// Clear buttons
clearSource.addEventListener('click', function() {
sourceText.innerHTML = '<p class="text-muted">Your transcribed text will appear here...</p>';
currentSourceText = '';
playSource.disabled = true;
translateBtn.disabled = true;
});
clearTranslation.addEventListener('click', function() {
translatedText.innerHTML = '<p class="text-muted">Translation will appear here...</p>';
currentTranslationText = '';
playTranslation.disabled = true;
});
// Progress indicator functions
function showProgress() {
progressContainer.classList.remove('d-none');
let progress = 0;
const interval = setInterval(() => {
progress += 5;
if (progress > 90) {
clearInterval(interval);
}
progressBar.style.width = `${progress}%`;
}, 100);
progressBar.dataset.interval = interval;
}
function hideProgress() {
const interval = progressBar.dataset.interval;
if (interval) {
clearInterval(Number(interval));
}
progressBar.style.width = '100%';
setTimeout(() => {
progressContainer.classList.add('d-none');
progressBar.style.width = '0%';
}, 500);
}
});
</script>
</body>
</html>
EOL
# Create app.py
cat > app.py << 'EOL'
import os
import time
import tempfile
import requests
import json
from flask import Flask, render_template, request, jsonify, Response, send_file
import whisper
import torch
import ollama
import logging
app = Flask(__name__)
app.config['UPLOAD_FOLDER'] = tempfile.mkdtemp()
app.config['TTS_SERVER'] = os.environ.get('TTS_SERVER_URL', 'http://localhost:5050/v1/audio/speech')
app.config['TTS_API_KEY'] = os.environ.get('TTS_API_KEY', 'your_api_key_here')
# Add a route to check TTS server status
@app.route('/check_tts_server', methods=['GET'])
def check_tts_server():
try:
# Try a simple HTTP request to the TTS server
response = requests.get(app.config['TTS_SERVER'].rsplit('/api/generate', 1)[0] + '/status', timeout=5)
if response.status_code == 200:
return jsonify({
'status': 'online',
'url': app.config['TTS_SERVER']
})
else:
return jsonify({
'status': 'error',
'message': f'TTS server returned status code {response.status_code}',
'url': app.config['TTS_SERVER']
})
except requests.exceptions.RequestException as e:
return jsonify({
'status': 'error',
'message': f'Cannot connect to TTS server: {str(e)}',
'url': app.config['TTS_SERVER']
})
# Initialize logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# Load Whisper model
logger.info("Loading Whisper model...")
whisper_model = whisper.load_model("base")
logger.info("Whisper model loaded successfully")
# Supported languages
SUPPORTED_LANGUAGES = {
"ar": "Arabic",
"hy": "Armenian",
"az": "Azerbaijani",
"en": "English",
"fr": "French",
"ka": "Georgian",
"kk": "Kazakh",
"zh": "Mandarin",
"fa": "Farsi",
"pt": "Portuguese",
"ru": "Russian",
"es": "Spanish",
"tr": "Turkish",
"uz": "Uzbek"
}
# Map language names to language codes
LANGUAGE_TO_CODE = {v: k for k, v in SUPPORTED_LANGUAGES.items()}
# Map language names to OpenAI TTS voice options
LANGUAGE_TO_VOICE = {
"Arabic": "alloy", # Using OpenAI general voices
"Armenian": "echo", # as OpenAI doesn't have specific voices
"Azerbaijani": "nova", # for all these languages
"English": "echo", # We'll use the available voices
"French": "alloy", # and rely on the translation being
"Georgian": "fable", # in the correct language text
"Kazakh": "onyx",
"Mandarin": "shimmer",
"Farsi": "nova",
"Portuguese": "alloy",
"Russian": "echo",
"Spanish": "nova",
"Turkish": "fable",
"Uzbek": "onyx"
}
@app.route('/')
def index():
return render_template('index.html', languages=sorted(SUPPORTED_LANGUAGES.values()))
@app.route('/transcribe', methods=['POST'])
def transcribe():
if 'audio' not in request.files:
return jsonify({'error': 'No audio file provided'}), 400
audio_file = request.files['audio']
source_lang = request.form.get('source_lang', '')
# Save the audio file temporarily
temp_path = os.path.join(app.config['UPLOAD_FOLDER'], 'input_audio.wav')
audio_file.save(temp_path)
try:
# Use Whisper for transcription
result = whisper_model.transcribe(
temp_path,
language=LANGUAGE_TO_CODE.get(source_lang, None)
)
transcribed_text = result["text"]
return jsonify({
'success': True,
'text': transcribed_text
})
except Exception as e:
logger.error(f"Transcription error: {str(e)}")
return jsonify({'error': f'Transcription failed: {str(e)}'}), 500
finally:
# Clean up the temporary file
if os.path.exists(temp_path):
os.remove(temp_path)
@app.route('/translate', methods=['POST'])
def translate():
try:
data = request.json
text = data.get('text', '')
source_lang = data.get('source_lang', '')
target_lang = data.get('target_lang', '')
if not text or not source_lang or not target_lang:
return jsonify({'error': 'Missing required parameters'}), 400
# Create a prompt for Gemma 3 translation
prompt = f"""
Translate the following text from {source_lang} to {target_lang}:
"{text}"
Provide only the translation without any additional text.
"""
# Use Ollama to interact with Gemma 3
response = ollama.chat(
model="gemma3",
messages=[
{
"role": "user",
"content": prompt
}
]
)
translated_text = response['message']['content'].strip()
return jsonify({
'success': True,
'translation': translated_text
})
except Exception as e:
logger.error(f"Translation error: {str(e)}")
return jsonify({'error': f'Translation failed: {str(e)}'}), 500
@app.route('/speak', methods=['POST'])
def speak():
try:
data = request.json
text = data.get('text', '')
language = data.get('language', '')
if not text or not language:
return jsonify({'error': 'Missing required parameters'}), 400
voice = LANGUAGE_TO_VOICE.get(language)
if not voice:
return jsonify({'error': 'Unsupported language for TTS'}), 400
# Get TTS server URL from environment or config
tts_server_url = app.config['TTS_SERVER']
try:
# Request TTS from the Edge TTS server
logger.info(f"Sending TTS request to {tts_server_url}")
tts_response = requests.post(
tts_server_url,
json={
'text': text,
'voice': voice,
'output_format': 'mp3'
},
timeout=10 # Add timeout
)
logger.info(f"TTS response status: {tts_response.status_code}")
if tts_response.status_code != 200:
error_msg = f'TTS request failed with status {tts_response.status_code}'
logger.error(error_msg)
# Try to get error details from response if possible
try:
error_details = tts_response.json()
logger.error(f"Error details: {error_details}")
except:
pass
return jsonify({'error': error_msg}), 500
# The response contains the audio data directly
temp_audio_path = os.path.join(app.config['UPLOAD_FOLDER'], f'output_{int(time.time())}.mp3')
with open(temp_audio_path, 'wb') as f:
f.write(tts_response.content)
return jsonify({
'success': True,
'audio_url': f'/get_audio/{os.path.basename(temp_audio_path)}'
})
except requests.exceptions.RequestException as e:
error_msg = f'Failed to connect to TTS server: {str(e)}'
logger.error(error_msg)
return jsonify({'error': error_msg}), 500
except Exception as e:
logger.error(f"TTS error: {str(e)}")
return jsonify({'error': f'TTS failed: {str(e)}'}), 500
@app.route('/get_audio/<filename>')
def get_audio(filename):
try:
file_path = os.path.join(app.config['UPLOAD_FOLDER'], filename)
return send_file(file_path, mimetype='audio/mpeg')
except Exception as e:
logger.error(f"Audio retrieval error: {str(e)}")
return jsonify({'error': f'Audio retrieval failed: {str(e)}'}), 500
if __name__ == '__main__':
app.run(host='0.0.0.0', port=8000, debug=True)
EOL
# Create requirements.txt
cat > requirements.txt << 'EOL'
flask==2.3.2
requests==2.31.0
openai-whisper==20231117
torch==2.1.0
ollama==0.1.5
EOL
# Create README.md
cat > README.md << 'EOL'
# Voice Language Translator
A mobile-friendly web application that translates spoken language between multiple languages using:
- Gemma 3 open-source LLM via Ollama for translation
- OpenAI Whisper for speech-to-text
- OpenAI Edge TTS for text-to-speech
## Supported Languages
- Arabic
- Armenian
- Azerbaijani
- English
- French
- Georgian
- Kazakh
- Mandarin
- Farsi
- Portuguese
- Russian
- Spanish
- Turkish
- Uzbek
## Setup Instructions
1. Install the required Python packages:
```
pip install -r requirements.txt
```
2. Make sure you have Ollama installed and the Gemma 3 model loaded:
```
ollama pull gemma3
```
3. Ensure your OpenAI Edge TTS server is running on port 5050.
4. Run the application:
```
python app.py
```
5. Open your browser and navigate to:
```
http://localhost:8000
```
## Usage
1. Select your source language from the dropdown menu
2. Press the microphone button and speak
3. Press the button again to stop recording
4. Wait for the transcription to complete
5. Select your target language
6. Press the "Translate" button
7. Use the play buttons to hear the original or translated text
## Technical Details
- The app uses Flask for the web server
- Audio is processed client-side using the MediaRecorder API
- Whisper for speech recognition with language hints
- Ollama provides access to the Gemma 3 model for translation
- OpenAI Edge TTS delivers natural-sounding speech output
## Mobile Support
The interface is fully responsive and designed to work well on mobile devices.
EOL
# Make the script executable
chmod +x app.py
echo "Setup complete! Run the app with: python app.py"

File diff suppressed because it is too large Load Diff

View File

@@ -1,228 +0,0 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>CORS Test for Talk2Me</title>
<style>
body {
font-family: Arial, sans-serif;
max-width: 800px;
margin: 50px auto;
padding: 20px;
}
.test-result {
margin: 10px 0;
padding: 10px;
border-radius: 5px;
}
.success {
background-color: #d4edda;
color: #155724;
border: 1px solid #c3e6cb;
}
.error {
background-color: #f8d7da;
color: #721c24;
border: 1px solid #f5c6cb;
}
button {
background-color: #007bff;
color: white;
padding: 10px 20px;
border: none;
border-radius: 5px;
cursor: pointer;
margin: 5px;
}
button:hover {
background-color: #0056b3;
}
input {
width: 100%;
padding: 10px;
margin: 10px 0;
border: 1px solid #ddd;
border-radius: 5px;
}
#results {
margin-top: 20px;
}
pre {
background-color: #f8f9fa;
padding: 10px;
border-radius: 5px;
overflow-x: auto;
}
</style>
</head>
<body>
<h1>CORS Test for Talk2Me API</h1>
<p>This page tests CORS configuration for the Talk2Me API. Open this file from a different origin (e.g., file:// or a different port) to test cross-origin requests.</p>
<div>
<label for="apiUrl">API Base URL:</label>
<input type="text" id="apiUrl" placeholder="http://localhost:5005" value="http://localhost:5005">
</div>
<h2>Tests:</h2>
<button onclick="testHealthEndpoint()">Test Health Endpoint</button>
<button onclick="testPreflightRequest()">Test Preflight Request</button>
<button onclick="testTranscribeEndpoint()">Test Transcribe Endpoint (OPTIONS)</button>
<button onclick="testWithCredentials()">Test With Credentials</button>
<div id="results"></div>
<script>
function addResult(test, success, message, details = null) {
const resultsDiv = document.getElementById('results');
const resultDiv = document.createElement('div');
resultDiv.className = `test-result ${success ? 'success' : 'error'}`;
let html = `<strong>${test}:</strong> ${message}`;
if (details) {
html += `<pre>${JSON.stringify(details, null, 2)}</pre>`;
}
resultDiv.innerHTML = html;
resultsDiv.appendChild(resultDiv);
}
function getApiUrl() {
return document.getElementById('apiUrl').value.trim();
}
async function testHealthEndpoint() {
const apiUrl = getApiUrl();
try {
const response = await fetch(`${apiUrl}/health`, {
method: 'GET',
mode: 'cors',
headers: {
'Origin': window.location.origin
}
});
const data = await response.json();
// Check CORS headers
const corsHeaders = {
'Access-Control-Allow-Origin': response.headers.get('Access-Control-Allow-Origin'),
'Access-Control-Allow-Credentials': response.headers.get('Access-Control-Allow-Credentials')
};
addResult('Health Endpoint GET', true, 'Request successful', {
status: response.status,
data: data,
corsHeaders: corsHeaders
});
} catch (error) {
addResult('Health Endpoint GET', false, error.message);
}
}
async function testPreflightRequest() {
const apiUrl = getApiUrl();
try {
const response = await fetch(`${apiUrl}/api/push-public-key`, {
method: 'OPTIONS',
mode: 'cors',
headers: {
'Origin': window.location.origin,
'Access-Control-Request-Method': 'GET',
'Access-Control-Request-Headers': 'content-type'
}
});
const corsHeaders = {
'Access-Control-Allow-Origin': response.headers.get('Access-Control-Allow-Origin'),
'Access-Control-Allow-Methods': response.headers.get('Access-Control-Allow-Methods'),
'Access-Control-Allow-Headers': response.headers.get('Access-Control-Allow-Headers'),
'Access-Control-Max-Age': response.headers.get('Access-Control-Max-Age')
};
addResult('Preflight Request', response.ok, `Status: ${response.status}`, corsHeaders);
} catch (error) {
addResult('Preflight Request', false, error.message);
}
}
async function testTranscribeEndpoint() {
const apiUrl = getApiUrl();
try {
const response = await fetch(`${apiUrl}/transcribe`, {
method: 'OPTIONS',
mode: 'cors',
headers: {
'Origin': window.location.origin,
'Access-Control-Request-Method': 'POST',
'Access-Control-Request-Headers': 'content-type'
}
});
const corsHeaders = {
'Access-Control-Allow-Origin': response.headers.get('Access-Control-Allow-Origin'),
'Access-Control-Allow-Methods': response.headers.get('Access-Control-Allow-Methods'),
'Access-Control-Allow-Headers': response.headers.get('Access-Control-Allow-Headers'),
'Access-Control-Allow-Credentials': response.headers.get('Access-Control-Allow-Credentials')
};
addResult('Transcribe Endpoint OPTIONS', response.ok, `Status: ${response.status}`, corsHeaders);
} catch (error) {
addResult('Transcribe Endpoint OPTIONS', false, error.message);
}
}
async function testWithCredentials() {
const apiUrl = getApiUrl();
try {
const response = await fetch(`${apiUrl}/health`, {
method: 'GET',
mode: 'cors',
credentials: 'include',
headers: {
'Origin': window.location.origin
}
});
const data = await response.json();
addResult('Request with Credentials', true, 'Request successful', {
status: response.status,
credentialsIncluded: true,
data: data
});
} catch (error) {
addResult('Request with Credentials', false, error.message);
}
}
// Clear results before running new tests
function clearResults() {
document.getElementById('results').innerHTML = '';
}
// Add event listeners
document.querySelectorAll('button').forEach(button => {
button.addEventListener('click', (e) => {
if (!e.target.textContent.includes('Test')) return;
clearResults();
});
});
// Show current origin
window.addEventListener('load', () => {
const info = document.createElement('div');
info.style.marginBottom = '20px';
info.style.padding = '10px';
info.style.backgroundColor = '#e9ecef';
info.style.borderRadius = '5px';
info.innerHTML = `<strong>Current Origin:</strong> ${window.location.origin}<br>
<strong>Protocol:</strong> ${window.location.protocol}<br>
<strong>Note:</strong> For effective CORS testing, open this file from a different origin than your API server.`;
document.body.insertBefore(info, document.querySelector('h2'));
});
</script>
</body>
</html>

View File

@@ -1,168 +0,0 @@
#!/usr/bin/env python3
"""
Test script for error logging system
"""
import logging
import json
import os
import time
from error_logger import ErrorLogger, log_errors, log_performance, get_logger
def test_basic_logging():
"""Test basic logging functionality"""
print("\n=== Testing Basic Logging ===")
# Get logger
logger = get_logger('test')
# Test different log levels
logger.debug("This is a debug message")
logger.info("This is an info message")
logger.warning("This is a warning message")
logger.error("This is an error message")
print("✓ Basic logging test completed")
def test_error_logging():
"""Test error logging with exceptions"""
print("\n=== Testing Error Logging ===")
@log_errors('test.functions')
def failing_function():
raise ValueError("This is a test error")
try:
failing_function()
except ValueError:
print("✓ Error was logged")
# Check if error log exists
if os.path.exists('logs/errors.log'):
print("✓ Error log file created")
# Read last line
with open('logs/errors.log', 'r') as f:
lines = f.readlines()
if lines:
try:
error_entry = json.loads(lines[-1])
print(f"✓ Error logged with level: {error_entry.get('level')}")
print(f"✓ Error type: {error_entry.get('exception', {}).get('type')}")
except json.JSONDecodeError:
print("✗ Error log entry is not valid JSON")
else:
print("✗ Error log file not created")
def test_performance_logging():
"""Test performance logging"""
print("\n=== Testing Performance Logging ===")
@log_performance('test_operation')
def slow_function():
time.sleep(0.1) # Simulate slow operation
return "result"
result = slow_function()
print(f"✓ Function returned: {result}")
# Check performance log
if os.path.exists('logs/performance.log'):
print("✓ Performance log file created")
# Read last line
with open('logs/performance.log', 'r') as f:
lines = f.readlines()
if lines:
try:
perf_entry = json.loads(lines[-1])
duration = perf_entry.get('extra_fields', {}).get('duration_ms', 0)
print(f"✓ Performance logged with duration: {duration}ms")
except json.JSONDecodeError:
print("✗ Performance log entry is not valid JSON")
else:
print("✗ Performance log file not created")
def test_structured_logging():
"""Test structured logging format"""
print("\n=== Testing Structured Logging ===")
logger = get_logger('test.structured')
# Log with extra fields
logger.info("Structured log test", extra={
'extra_fields': {
'user_id': 123,
'action': 'test_action',
'metadata': {'key': 'value'}
}
})
# Check main log
if os.path.exists('logs/talk2me.log'):
with open('logs/talk2me.log', 'r') as f:
lines = f.readlines()
if lines:
try:
# Find our test entry
for line in reversed(lines):
entry = json.loads(line)
if entry.get('message') == 'Structured log test':
print("✓ Structured log entry found")
print(f"✓ Contains timestamp: {'timestamp' in entry}")
print(f"✓ Contains hostname: {'hostname' in entry}")
print(f"✓ Contains extra fields: {'user_id' in entry}")
break
except json.JSONDecodeError:
print("✗ Log entry is not valid JSON")
def test_log_rotation():
"""Test log rotation settings"""
print("\n=== Testing Log Rotation ===")
# Check if log files exist and their sizes
log_files = {
'talk2me.log': 'logs/talk2me.log',
'errors.log': 'logs/errors.log',
'access.log': 'logs/access.log',
'security.log': 'logs/security.log',
'performance.log': 'logs/performance.log'
}
for name, path in log_files.items():
if os.path.exists(path):
size = os.path.getsize(path)
print(f"{name}: {size} bytes")
else:
print(f"- {name}: not created yet")
def main():
"""Run all tests"""
print("Error Logging System Tests")
print("==========================")
# Create a test Flask app
from flask import Flask
app = Flask(__name__)
app.config['LOG_LEVEL'] = 'DEBUG'
app.config['FLASK_ENV'] = 'testing'
# Initialize error logger
error_logger = ErrorLogger(app)
# Run tests
test_basic_logging()
test_error_logging()
test_performance_logging()
test_structured_logging()
test_log_rotation()
print("\n✅ All tests completed!")
print("\nCheck the logs directory for generated log files:")
print("- logs/talk2me.log - Main application log")
print("- logs/errors.log - Error log with stack traces")
print("- logs/performance.log - Performance metrics")
print("- logs/access.log - HTTP access log")
print("- logs/security.log - Security events")
if __name__ == "__main__":
main()

View File

@@ -1,264 +0,0 @@
#!/usr/bin/env python3
"""
Unit tests for session management system
"""
import unittest
import tempfile
import shutil
import time
import os
from session_manager import SessionManager, UserSession, SessionResource
from flask import Flask, g, session
class TestSessionManager(unittest.TestCase):
def setUp(self):
"""Set up test fixtures"""
self.temp_dir = tempfile.mkdtemp()
self.config = {
'max_session_duration': 3600,
'max_idle_time': 900,
'max_resources_per_session': 5, # Small limit for testing
'max_bytes_per_session': 1024 * 1024, # 1MB for testing
'cleanup_interval': 1, # 1 second for faster testing
'session_storage_path': self.temp_dir
}
self.manager = SessionManager(self.config)
def tearDown(self):
"""Clean up test fixtures"""
shutil.rmtree(self.temp_dir, ignore_errors=True)
def test_create_session(self):
"""Test session creation"""
session = self.manager.create_session(
session_id='test-123',
user_id='user-1',
ip_address='127.0.0.1',
user_agent='Test Agent'
)
self.assertEqual(session.session_id, 'test-123')
self.assertEqual(session.user_id, 'user-1')
self.assertEqual(session.ip_address, '127.0.0.1')
self.assertEqual(session.user_agent, 'Test Agent')
self.assertEqual(len(session.resources), 0)
def test_get_session(self):
"""Test session retrieval"""
self.manager.create_session(session_id='test-456')
session = self.manager.get_session('test-456')
self.assertIsNotNone(session)
self.assertEqual(session.session_id, 'test-456')
# Non-existent session
session = self.manager.get_session('non-existent')
self.assertIsNone(session)
def test_add_resource(self):
"""Test adding resources to session"""
self.manager.create_session(session_id='test-789')
# Add a resource
resource = self.manager.add_resource(
session_id='test-789',
resource_type='audio_file',
resource_id='audio-1',
path='/tmp/test.wav',
size_bytes=1024,
metadata={'format': 'wav'}
)
self.assertIsNotNone(resource)
self.assertEqual(resource.resource_id, 'audio-1')
self.assertEqual(resource.resource_type, 'audio_file')
self.assertEqual(resource.size_bytes, 1024)
# Check session updated
session = self.manager.get_session('test-789')
self.assertEqual(len(session.resources), 1)
self.assertEqual(session.total_bytes_used, 1024)
def test_resource_limits(self):
"""Test resource limit enforcement"""
self.manager.create_session(session_id='test-limits')
# Add resources up to limit
for i in range(5):
self.manager.add_resource(
session_id='test-limits',
resource_type='temp_file',
resource_id=f'file-{i}',
size_bytes=100
)
session = self.manager.get_session('test-limits')
self.assertEqual(len(session.resources), 5)
# Add one more - should remove oldest
self.manager.add_resource(
session_id='test-limits',
resource_type='temp_file',
resource_id='file-new',
size_bytes=100
)
session = self.manager.get_session('test-limits')
self.assertEqual(len(session.resources), 5) # Still 5
self.assertNotIn('file-0', session.resources) # Oldest removed
self.assertIn('file-new', session.resources) # New one added
def test_size_limits(self):
"""Test size limit enforcement"""
self.manager.create_session(session_id='test-size')
# Add a large resource
self.manager.add_resource(
session_id='test-size',
resource_type='audio_file',
resource_id='large-1',
size_bytes=500 * 1024 # 500KB
)
# Add another large resource
self.manager.add_resource(
session_id='test-size',
resource_type='audio_file',
resource_id='large-2',
size_bytes=600 * 1024 # 600KB - would exceed 1MB limit
)
session = self.manager.get_session('test-size')
# First resource should be removed to make space
self.assertNotIn('large-1', session.resources)
self.assertIn('large-2', session.resources)
self.assertLessEqual(session.total_bytes_used, 1024 * 1024)
def test_remove_resource(self):
"""Test resource removal"""
self.manager.create_session(session_id='test-remove')
self.manager.add_resource(
session_id='test-remove',
resource_type='temp_file',
resource_id='to-remove',
size_bytes=1000
)
# Remove resource
success = self.manager.remove_resource('test-remove', 'to-remove')
self.assertTrue(success)
# Check it's gone
session = self.manager.get_session('test-remove')
self.assertEqual(len(session.resources), 0)
self.assertEqual(session.total_bytes_used, 0)
def test_cleanup_session(self):
"""Test session cleanup"""
# Create session with resources
self.manager.create_session(session_id='test-cleanup')
# Create actual temp file
temp_file = os.path.join(self.temp_dir, 'test-file.txt')
with open(temp_file, 'w') as f:
f.write('test content')
self.manager.add_resource(
session_id='test-cleanup',
resource_type='temp_file',
path=temp_file,
size_bytes=12
)
# Cleanup session
success = self.manager.cleanup_session('test-cleanup')
self.assertTrue(success)
# Check session is gone
session = self.manager.get_session('test-cleanup')
self.assertIsNone(session)
# Check file is deleted
self.assertFalse(os.path.exists(temp_file))
def test_session_info(self):
"""Test session info retrieval"""
self.manager.create_session(
session_id='test-info',
ip_address='192.168.1.1'
)
self.manager.add_resource(
session_id='test-info',
resource_type='audio_file',
size_bytes=2048
)
info = self.manager.get_session_info('test-info')
self.assertIsNotNone(info)
self.assertEqual(info['session_id'], 'test-info')
self.assertEqual(info['ip_address'], '192.168.1.1')
self.assertEqual(info['resource_count'], 1)
self.assertEqual(info['total_bytes_used'], 2048)
def test_stats(self):
"""Test statistics calculation"""
# Create multiple sessions
for i in range(3):
self.manager.create_session(session_id=f'test-stats-{i}')
self.manager.add_resource(
session_id=f'test-stats-{i}',
resource_type='temp_file',
size_bytes=1000
)
stats = self.manager.get_stats()
self.assertEqual(stats['active_sessions'], 3)
self.assertEqual(stats['active_resources'], 3)
self.assertEqual(stats['active_bytes'], 3000)
self.assertEqual(stats['total_sessions_created'], 3)
def test_metrics_export(self):
"""Test metrics export"""
self.manager.create_session(session_id='test-metrics')
metrics = self.manager.export_metrics()
self.assertIn('sessions', metrics)
self.assertIn('resources', metrics)
self.assertIn('limits', metrics)
self.assertEqual(metrics['sessions']['active'], 1)
class TestFlaskIntegration(unittest.TestCase):
def setUp(self):
"""Set up Flask app for testing"""
self.app = Flask(__name__)
self.app.config['TESTING'] = True
self.app.config['SECRET_KEY'] = 'test-secret'
self.temp_dir = tempfile.mkdtemp()
self.app.config['UPLOAD_FOLDER'] = self.temp_dir
# Initialize session manager
from session_manager import init_app
init_app(self.app)
self.client = self.app.test_client()
self.ctx = self.app.test_request_context()
self.ctx.push()
def tearDown(self):
"""Clean up"""
self.ctx.pop()
shutil.rmtree(self.temp_dir, ignore_errors=True)
def test_before_request_handler(self):
"""Test Flask before_request integration"""
with self.client:
# Make a request
response = self.client.get('/')
# Session should be created
with self.client.session_transaction() as sess:
self.assertIn('session_id', sess)
if __name__ == '__main__':
unittest.main()

View File

@@ -1,146 +0,0 @@
#!/usr/bin/env python3
"""
Test script for request size limits
"""
import requests
import json
import io
import os
BASE_URL = "http://localhost:5005"
def test_json_size_limit():
"""Test JSON payload size limit"""
print("\n=== Testing JSON Size Limit ===")
# Create a large JSON payload (over 1MB)
large_data = {
"text": "x" * (2 * 1024 * 1024), # 2MB of text
"source_lang": "English",
"target_lang": "Spanish"
}
try:
response = requests.post(f"{BASE_URL}/translate", json=large_data)
print(f"Status: {response.status_code}")
if response.status_code == 413:
print(f"✓ Correctly rejected large JSON: {response.json()}")
else:
print(f"✗ Should have rejected large JSON")
except Exception as e:
print(f"Error: {e}")
def test_audio_size_limit():
"""Test audio file size limit"""
print("\n=== Testing Audio Size Limit ===")
# Create a fake large audio file (over 25MB)
large_audio = io.BytesIO(b"x" * (30 * 1024 * 1024)) # 30MB
files = {
'audio': ('large_audio.wav', large_audio, 'audio/wav')
}
data = {
'source_lang': 'English'
}
try:
response = requests.post(f"{BASE_URL}/transcribe", files=files, data=data)
print(f"Status: {response.status_code}")
if response.status_code == 413:
print(f"✓ Correctly rejected large audio: {response.json()}")
else:
print(f"✗ Should have rejected large audio")
except Exception as e:
print(f"Error: {e}")
def test_valid_requests():
"""Test that valid-sized requests are accepted"""
print("\n=== Testing Valid Size Requests ===")
# Small JSON payload
small_data = {
"text": "Hello world",
"source_lang": "English",
"target_lang": "Spanish"
}
try:
response = requests.post(f"{BASE_URL}/translate", json=small_data)
print(f"Small JSON - Status: {response.status_code}")
if response.status_code != 413:
print("✓ Small JSON accepted")
else:
print("✗ Small JSON should be accepted")
except Exception as e:
print(f"Error: {e}")
# Small audio file
small_audio = io.BytesIO(b"RIFF" + b"x" * 1000) # 1KB fake WAV
files = {
'audio': ('small_audio.wav', small_audio, 'audio/wav')
}
data = {
'source_lang': 'English'
}
try:
response = requests.post(f"{BASE_URL}/transcribe", files=files, data=data)
print(f"Small audio - Status: {response.status_code}")
if response.status_code != 413:
print("✓ Small audio accepted")
else:
print("✗ Small audio should be accepted")
except Exception as e:
print(f"Error: {e}")
def test_admin_endpoints():
"""Test admin endpoints for size limits"""
print("\n=== Testing Admin Endpoints ===")
headers = {'X-Admin-Token': os.environ.get('ADMIN_TOKEN', 'default-admin-token')}
# Get current limits
try:
response = requests.get(f"{BASE_URL}/admin/size-limits", headers=headers)
print(f"Get limits - Status: {response.status_code}")
if response.status_code == 200:
limits = response.json()
print(f"✓ Current limits: {limits['limits_human']}")
else:
print(f"✗ Failed to get limits: {response.text}")
except Exception as e:
print(f"Error: {e}")
# Update limits
new_limits = {
"max_audio_size": "30MB",
"max_json_size": 2097152 # 2MB in bytes
}
try:
response = requests.post(f"{BASE_URL}/admin/size-limits",
json=new_limits, headers=headers)
print(f"\nUpdate limits - Status: {response.status_code}")
if response.status_code == 200:
result = response.json()
print(f"✓ Updated limits: {result['new_limits_human']}")
else:
print(f"✗ Failed to update limits: {response.text}")
except Exception as e:
print(f"Error: {e}")
if __name__ == "__main__":
print("Request Size Limit Tests")
print("========================")
print(f"Testing against: {BASE_URL}")
print("\nMake sure the Flask app is running on port 5005")
input("\nPress Enter to start tests...")
test_valid_requests()
test_json_size_limit()
test_audio_size_limit()
test_admin_endpoints()
print("\n✅ All tests completed!")

View File

@@ -1,78 +0,0 @@
#!/usr/bin/env python
"""
TTS Debug Script - Tests connection to the OpenAI TTS server
"""
import os
import sys
import json
import requests
from argparse import ArgumentParser
def test_tts_connection(server_url, api_key, text="Hello, this is a test message"):
"""Test connection to the TTS server"""
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {api_key}"
}
payload = {
"input": text,
"voice": "echo",
"response_format": "mp3",
"speed": 1.0
}
print(f"Sending request to: {server_url}")
print(f"Headers: {headers}")
print(f"Payload: {json.dumps(payload, indent=2)}")
try:
response = requests.post(
server_url,
headers=headers,
json=payload,
timeout=15
)
print(f"Response status code: {response.status_code}")
if response.status_code == 200:
print("Success! Received audio data")
# Save to file
output_file = "tts_test_output.mp3"
with open(output_file, "wb") as f:
f.write(response.content)
print(f"Saved audio to {output_file}")
return True
else:
print("Error in response")
try:
error_data = response.json()
print(f"Error details: {json.dumps(error_data, indent=2)}")
except:
print(f"Raw response: {response.text[:500]}")
return False
except Exception as e:
print(f"Error during request: {str(e)}")
return False
def main():
parser = ArgumentParser(description="Test connection to OpenAI TTS server")
parser.add_argument("--url", default="http://localhost:5050/v1/audio/speech", help="TTS server URL")
parser.add_argument("--key", default=os.environ.get("TTS_API_KEY", ""), help="API key")
parser.add_argument("--text", default="Hello, this is a test message", help="Text to synthesize")
args = parser.parse_args()
if not args.key:
print("Error: API key is required. Use --key argument or set TTS_API_KEY environment variable.")
return 1
success = test_tts_connection(args.url, args.key, args.text)
return 0 if success else 1
if __name__ == "__main__":
sys.exit(main())

Binary file not shown.