talk2me/RATE_LIMITING.md
Adolfo Delorenzo a4ef775731 Implement comprehensive rate limiting to protect against DoS attacks
- Add token bucket rate limiter with sliding window algorithm
- Implement per-endpoint configurable rate limits
- Add automatic IP blocking for excessive requests
- Implement global request limits and concurrent request throttling
- Add request size validation for all endpoints
- Create admin endpoints for rate limit management
- Add rate limit headers to responses
- Implement cleanup thread for old rate limit buckets
- Create detailed rate limiting documentation

Rate limits:
- Transcription: 10/min, 100/hour, max 10MB
- Translation: 20/min, 300/hour, max 100KB
- Streaming: 10/min, 150/hour, max 100KB
- TTS: 15/min, 200/hour, max 50KB
- Global: 1000/min, 10000/hour, 50 concurrent

Security features:
- Automatic temporary IP blocking (1 hour) for abuse
- Manual IP blocking via admin endpoint
- Request size validation to prevent large payload attacks
- Burst control to limit sudden traffic spikes

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-03 00:14:05 -06:00

5.4 KiB

Rate Limiting Documentation

This document describes the rate limiting implementation in Talk2Me to protect against DoS attacks and resource exhaustion.

Overview

Talk2Me implements a comprehensive rate limiting system with:

  • Token bucket algorithm with sliding window
  • Per-endpoint configurable limits
  • IP-based blocking (temporary and permanent)
  • Global request limits
  • Concurrent request throttling
  • Request size validation

Rate Limits by Endpoint

Transcription (/transcribe)

  • Per Minute: 10 requests
  • Per Hour: 100 requests
  • Burst Size: 3 requests
  • Max Request Size: 10MB
  • Token Refresh: 1 token per 6 seconds

Translation (/translate)

  • Per Minute: 20 requests
  • Per Hour: 300 requests
  • Burst Size: 5 requests
  • Max Request Size: 100KB
  • Token Refresh: 1 token per 3 seconds

Streaming Translation (/translate/stream)

  • Per Minute: 10 requests
  • Per Hour: 150 requests
  • Burst Size: 3 requests
  • Max Request Size: 100KB
  • Token Refresh: 1 token per 6 seconds

Text-to-Speech (/speak)

  • Per Minute: 15 requests
  • Per Hour: 200 requests
  • Burst Size: 3 requests
  • Max Request Size: 50KB
  • Token Refresh: 1 token per 4 seconds

API Endpoints

  • Push notifications, error logging: Various limits (see code)

Global Limits

  • Total Requests Per Minute: 1,000 (across all endpoints)
  • Total Requests Per Hour: 10,000
  • Concurrent Requests: 50 maximum

Rate Limiting Headers

Successful responses include:

X-RateLimit-Limit: 20
X-RateLimit-Remaining: 15
X-RateLimit-Reset: 1234567890

Rate limited responses (429) include:

X-RateLimit-Limit: 20
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1234567890
Retry-After: 60

Client Identification

Clients are identified by:

  • IP address (including X-Forwarded-For support)
  • User-Agent string
  • Combined hash for uniqueness

Automatic Blocking

IPs are temporarily blocked for 1 hour if:

  • They exceed 100 requests per minute
  • They repeatedly hit rate limits
  • They exhibit suspicious patterns

Configuration

Environment Variables

# No direct environment variables for rate limiting
# Configured in code - can be extended to use env vars

Programmatic Configuration

Rate limits can be adjusted in rate_limiter.py:

self.endpoint_limits = {
    '/transcribe': {
        'requests_per_minute': 10,
        'requests_per_hour': 100,
        'burst_size': 3,
        'token_refresh_rate': 0.167,
        'max_request_size': 10 * 1024 * 1024  # 10MB
    }
}

Admin Endpoints

Get Rate Limit Configuration

curl -H "X-Admin-Token: your-admin-token" \
  http://localhost:5005/admin/rate-limits

Get Rate Limit Statistics

# Global stats
curl -H "X-Admin-Token: your-admin-token" \
  http://localhost:5005/admin/rate-limits/stats

# Client-specific stats
curl -H "X-Admin-Token: your-admin-token" \
  http://localhost:5005/admin/rate-limits/stats?client_id=abc123

Block IP Address

# Temporary block (1 hour)
curl -X POST -H "X-Admin-Token: your-admin-token" \
  -H "Content-Type: application/json" \
  -d '{"ip": "192.168.1.100", "duration": 3600}' \
  http://localhost:5005/admin/block-ip

# Permanent block
curl -X POST -H "X-Admin-Token: your-admin-token" \
  -H "Content-Type: application/json" \
  -d '{"ip": "192.168.1.100", "permanent": true}' \
  http://localhost:5005/admin/block-ip

Algorithm Details

Token Bucket

  • Each client gets a bucket with configurable burst size
  • Tokens regenerate at a fixed rate
  • Requests consume tokens
  • Empty bucket = request denied

Sliding Window

  • Tracks requests in the last minute and hour
  • More accurate than fixed windows
  • Prevents gaming the system at window boundaries

Best Practices

For Users

  1. Implement exponential backoff when receiving 429 errors
  2. Check rate limit headers to avoid hitting limits
  3. Cache responses when possible
  4. Use bulk operations where available

For Administrators

  1. Monitor rate limit statistics regularly
  2. Adjust limits based on usage patterns
  3. Use IP blocking sparingly
  4. Set up alerts for suspicious activity

Error Responses

Rate Limited (429)

{
  "error": "Rate limit exceeded (per minute)",
  "retry_after": 60
}

Request Too Large (413)

{
  "error": "Request too large"
}

IP Blocked (429)

{
  "error": "IP temporarily blocked due to excessive requests"
}

Monitoring

Key metrics to monitor:

  • Rate limit hits by endpoint
  • Blocked IPs
  • Concurrent request peaks
  • Request size violations
  • Global limit approaches

Performance Impact

  • Minimal overhead (~1-2ms per request)
  • Memory usage scales with active clients
  • Automatic cleanup of old buckets
  • Thread-safe implementation

Security Considerations

  1. DoS Protection: Prevents resource exhaustion
  2. Burst Control: Limits sudden traffic spikes
  3. Size Validation: Prevents large payload attacks
  4. IP Blocking: Stops persistent attackers
  5. Global Limits: Protects overall system capacity

Troubleshooting

"Rate limit exceeded" errors

  • Check client request patterns
  • Verify time synchronization
  • Look for retry loops
  • Check IP blocking status

Memory usage increasing

  • Verify cleanup thread is running
  • Check for client ID explosion
  • Monitor bucket count

Legitimate users blocked

  • Review rate limit settings
  • Check for shared IP issues
  • Implement IP whitelisting if needed