Adolfo Delorenzo a4ef775731 Implement comprehensive rate limiting to protect against DoS attacks

- Add token bucket rate limiter with sliding window algorithm
- Implement per-endpoint configurable rate limits
- Add automatic IP blocking for excessive requests
- Implement global request limits and concurrent request throttling
- Add request size validation for all endpoints
- Create admin endpoints for rate limit management
- Add rate limit headers to responses
- Implement cleanup thread for old rate limit buckets
- Create detailed rate limiting documentation

Rate limits:
- Transcription: 10/min, 100/hour, max 10MB
- Translation: 20/min, 300/hour, max 100KB
- Streaming: 10/min, 150/hour, max 100KB
- TTS: 15/min, 200/hour, max 50KB
- Global: 1000/min, 10000/hour, 50 concurrent

Security features:
- Automatic temporary IP blocking (1 hour) for abuse
- Manual IP blocking via admin endpoint
- Request size validation to prevent large payload attacks
- Burst control to limit sudden traffic spikes

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-06-03 00:14:05 -06:00

5.4 KiB

Raw Blame History

Rate Limiting Documentation

This document describes the rate limiting implementation in Talk2Me to protect against DoS attacks and resource exhaustion.

Overview

Talk2Me implements a comprehensive rate limiting system with:

Token bucket algorithm with sliding window
Per-endpoint configurable limits
IP-based blocking (temporary and permanent)
Global request limits
Concurrent request throttling
Request size validation

Rate Limits by Endpoint

Transcription (`/transcribe`)

Per Minute: 10 requests
Per Hour: 100 requests
Burst Size: 3 requests
Max Request Size: 10MB
Token Refresh: 1 token per 6 seconds

Translation (`/translate`)

Per Minute: 20 requests
Per Hour: 300 requests
Burst Size: 5 requests
Max Request Size: 100KB
Token Refresh: 1 token per 3 seconds

Streaming Translation (`/translate/stream`)

Per Minute: 10 requests
Per Hour: 150 requests
Burst Size: 3 requests
Max Request Size: 100KB
Token Refresh: 1 token per 6 seconds

Text-to-Speech (`/speak`)

Per Minute: 15 requests
Per Hour: 200 requests
Burst Size: 3 requests
Max Request Size: 50KB
Token Refresh: 1 token per 4 seconds

API Endpoints

Push notifications, error logging: Various limits (see code)

Global Limits

Total Requests Per Minute: 1,000 (across all endpoints)
Total Requests Per Hour: 10,000
Concurrent Requests: 50 maximum

Rate Limiting Headers

Successful responses include:

X-RateLimit-Limit: 20
X-RateLimit-Remaining: 15
X-RateLimit-Reset: 1234567890

Rate limited responses (429) include:

X-RateLimit-Limit: 20
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1234567890
Retry-After: 60

Client Identification

Clients are identified by:

IP address (including X-Forwarded-For support)
User-Agent string
Combined hash for uniqueness

Automatic Blocking

IPs are temporarily blocked for 1 hour if:

They exceed 100 requests per minute
They repeatedly hit rate limits
They exhibit suspicious patterns

Configuration

Environment Variables

# No direct environment variables for rate limiting
# Configured in code - can be extended to use env vars

Programmatic Configuration

Rate limits can be adjusted in rate_limiter.py:

self.endpoint_limits = {
    '/transcribe': {
        'requests_per_minute': 10,
        'requests_per_hour': 100,
        'burst_size': 3,
        'token_refresh_rate': 0.167,
        'max_request_size': 10 * 1024 * 1024  # 10MB
    }
}

Admin Endpoints

Get Rate Limit Configuration

curl -H "X-Admin-Token: your-admin-token" \
  http://localhost:5005/admin/rate-limits

Get Rate Limit Statistics

# Global stats
curl -H "X-Admin-Token: your-admin-token" \
  http://localhost:5005/admin/rate-limits/stats

# Client-specific stats
curl -H "X-Admin-Token: your-admin-token" \
  http://localhost:5005/admin/rate-limits/stats?client_id=abc123

Block IP Address

# Temporary block (1 hour)
curl -X POST -H "X-Admin-Token: your-admin-token" \
  -H "Content-Type: application/json" \
  -d '{"ip": "192.168.1.100", "duration": 3600}' \
  http://localhost:5005/admin/block-ip

# Permanent block
curl -X POST -H "X-Admin-Token: your-admin-token" \
  -H "Content-Type: application/json" \
  -d '{"ip": "192.168.1.100", "permanent": true}' \
  http://localhost:5005/admin/block-ip

Algorithm Details

Token Bucket

Each client gets a bucket with configurable burst size
Tokens regenerate at a fixed rate
Requests consume tokens
Empty bucket = request denied

Sliding Window

Tracks requests in the last minute and hour
More accurate than fixed windows
Prevents gaming the system at window boundaries

Best Practices

For Users

Implement exponential backoff when receiving 429 errors
Check rate limit headers to avoid hitting limits
Cache responses when possible
Use bulk operations where available

For Administrators

Monitor rate limit statistics regularly
Adjust limits based on usage patterns
Use IP blocking sparingly
Set up alerts for suspicious activity

Error Responses

Rate Limited (429)

{
  "error": "Rate limit exceeded (per minute)",
  "retry_after": 60
}

Request Too Large (413)

{
  "error": "Request too large"
}

IP Blocked (429)

{
  "error": "IP temporarily blocked due to excessive requests"
}

Monitoring

Key metrics to monitor:

Rate limit hits by endpoint
Blocked IPs
Concurrent request peaks
Request size violations
Global limit approaches

Performance Impact

Minimal overhead (~1-2ms per request)
Memory usage scales with active clients
Automatic cleanup of old buckets
Thread-safe implementation

Security Considerations

DoS Protection: Prevents resource exhaustion
Burst Control: Limits sudden traffic spikes
Size Validation: Prevents large payload attacks
IP Blocking: Stops persistent attackers
Global Limits: Protects overall system capacity

Troubleshooting

"Rate limit exceeded" errors

Check client request patterns
Verify time synchronization
Look for retry loops
Check IP blocking status

Memory usage increasing

Verify cleanup thread is running
Check for client ID explosion
Monitor bucket count

Legitimate users blocked

Review rate limit settings
Check for shared IP issues
Implement IP whitelisting if needed

5.4 KiB Raw Blame History