# Rate Limiting Documentation This document describes the rate limiting implementation in Talk2Me to protect against DoS attacks and resource exhaustion. ## Overview Talk2Me implements a comprehensive rate limiting system with: - Token bucket algorithm with sliding window - Per-endpoint configurable limits - IP-based blocking (temporary and permanent) - Global request limits - Concurrent request throttling - Request size validation ## Rate Limits by Endpoint ### Transcription (`/transcribe`) - **Per Minute**: 10 requests - **Per Hour**: 100 requests - **Burst Size**: 3 requests - **Max Request Size**: 10MB - **Token Refresh**: 1 token per 6 seconds ### Translation (`/translate`) - **Per Minute**: 20 requests - **Per Hour**: 300 requests - **Burst Size**: 5 requests - **Max Request Size**: 100KB - **Token Refresh**: 1 token per 3 seconds ### Streaming Translation (`/translate/stream`) - **Per Minute**: 10 requests - **Per Hour**: 150 requests - **Burst Size**: 3 requests - **Max Request Size**: 100KB - **Token Refresh**: 1 token per 6 seconds ### Text-to-Speech (`/speak`) - **Per Minute**: 15 requests - **Per Hour**: 200 requests - **Burst Size**: 3 requests - **Max Request Size**: 50KB - **Token Refresh**: 1 token per 4 seconds ### API Endpoints - Push notifications, error logging: Various limits (see code) ## Global Limits - **Total Requests Per Minute**: 1,000 (across all endpoints) - **Total Requests Per Hour**: 10,000 - **Concurrent Requests**: 50 maximum ## Rate Limiting Headers Successful responses include: ``` X-RateLimit-Limit: 20 X-RateLimit-Remaining: 15 X-RateLimit-Reset: 1234567890 ``` Rate limited responses (429) include: ``` X-RateLimit-Limit: 20 X-RateLimit-Remaining: 0 X-RateLimit-Reset: 1234567890 Retry-After: 60 ``` ## Client Identification Clients are identified by: - IP address (including X-Forwarded-For support) - User-Agent string - Combined hash for uniqueness ## Automatic Blocking IPs are temporarily blocked for 1 hour if: - They exceed 100 requests per minute - They repeatedly hit rate limits - They exhibit suspicious patterns ## Configuration ### Environment Variables ```bash # No direct environment variables for rate limiting # Configured in code - can be extended to use env vars ``` ### Programmatic Configuration Rate limits can be adjusted in `rate_limiter.py`: ```python self.endpoint_limits = { '/transcribe': { 'requests_per_minute': 10, 'requests_per_hour': 100, 'burst_size': 3, 'token_refresh_rate': 0.167, 'max_request_size': 10 * 1024 * 1024 # 10MB } } ``` ## Admin Endpoints ### Get Rate Limit Configuration ```bash curl -H "X-Admin-Token: your-admin-token" \ http://localhost:5005/admin/rate-limits ``` ### Get Rate Limit Statistics ```bash # Global stats curl -H "X-Admin-Token: your-admin-token" \ http://localhost:5005/admin/rate-limits/stats # Client-specific stats curl -H "X-Admin-Token: your-admin-token" \ http://localhost:5005/admin/rate-limits/stats?client_id=abc123 ``` ### Block IP Address ```bash # Temporary block (1 hour) curl -X POST -H "X-Admin-Token: your-admin-token" \ -H "Content-Type: application/json" \ -d '{"ip": "192.168.1.100", "duration": 3600}' \ http://localhost:5005/admin/block-ip # Permanent block curl -X POST -H "X-Admin-Token: your-admin-token" \ -H "Content-Type: application/json" \ -d '{"ip": "192.168.1.100", "permanent": true}' \ http://localhost:5005/admin/block-ip ``` ## Algorithm Details ### Token Bucket - Each client gets a bucket with configurable burst size - Tokens regenerate at a fixed rate - Requests consume tokens - Empty bucket = request denied ### Sliding Window - Tracks requests in the last minute and hour - More accurate than fixed windows - Prevents gaming the system at window boundaries ## Best Practices ### For Users 1. Implement exponential backoff when receiving 429 errors 2. Check rate limit headers to avoid hitting limits 3. Cache responses when possible 4. Use bulk operations where available ### For Administrators 1. Monitor rate limit statistics regularly 2. Adjust limits based on usage patterns 3. Use IP blocking sparingly 4. Set up alerts for suspicious activity ## Error Responses ### Rate Limited (429) ```json { "error": "Rate limit exceeded (per minute)", "retry_after": 60 } ``` ### Request Too Large (413) ```json { "error": "Request too large" } ``` ### IP Blocked (429) ```json { "error": "IP temporarily blocked due to excessive requests" } ``` ## Monitoring Key metrics to monitor: - Rate limit hits by endpoint - Blocked IPs - Concurrent request peaks - Request size violations - Global limit approaches ## Performance Impact - Minimal overhead (~1-2ms per request) - Memory usage scales with active clients - Automatic cleanup of old buckets - Thread-safe implementation ## Security Considerations 1. **DoS Protection**: Prevents resource exhaustion 2. **Burst Control**: Limits sudden traffic spikes 3. **Size Validation**: Prevents large payload attacks 4. **IP Blocking**: Stops persistent attackers 5. **Global Limits**: Protects overall system capacity ## Troubleshooting ### "Rate limit exceeded" errors - Check client request patterns - Verify time synchronization - Look for retry loops - Check IP blocking status ### Memory usage increasing - Verify cleanup thread is running - Check for client ID explosion - Monitor bucket count ### Legitimate users blocked - Review rate limit settings - Check for shared IP issues - Implement IP whitelisting if needed