How Rate Limiting Works - Access Control to Protect APIs | Concepts

What is Rate Limiting

Rate Limiting is a mechanism that limits the number of API requests within a certain time period. It maintains service stability and protects systems from malicious use or excessive access due to bugs.

Why it’s needed: Without request limits, a single user could exhaust all system resources, or DDoS attacks could bring down services.

Purposes of Rate Limiting

Purpose	Description
Service protection	Prevent downtime from overload
Fairness	Distribute resources fairly to all users
Abuse prevention	Deter scraping, brute force attacks
Cost management	Ensure infrastructure cost predictability

Major Algorithms

1. Fixed Window

Resets the count at each fixed time window.

Time	Requests	Status
00:00-00:30	90	✓
00:30-00:59	10 (total 100)	✓
01:00	Counter reset	-
01:00-01:30	100	✓

Problem: Allows momentarily double the requests at window boundaries

Time	Requests	Problem
00:59	100 ✓
01:00	100 ✓	200 requests in 2 seconds!

2. Sliding Window Log

Records timestamps of each request and counts requests in the past N seconds.

Current time: 01:00:30, Window: Past 60 seconds (00:00:30-01:00:30)

Timestamp	Status
00:00:25	Outside window (delete)
00:00:35	✓ In window
00:00:50	✓ In window
01:00:10	✓ In window

Advantage: Accurate rate limiting Disadvantage: High memory usage

3. Sliding Window Counter

Improved version of fixed window. Calculates using weighted counts from previous and current windows.

Window	Requests
Previous (00:00-00:59)	80
Current (01:00-01:59)	30
Current time	01:00:20 (33% into window)

Estimated requests = 80 × 0.67 + 30 = 83.6

4. Token Bucket

Tokens are added to a bucket at a constant rate, and each request consumes a token.

Configuration: Bucket capacity: 10 tokens, Refill rate: 1 token/second

State	Tokens	Description
Initial	10/10	Full bucket
After 5 requests	5/10	5 tokens consumed
After 3 seconds	8/10	3 tokens refilled
Burst capacity	8	8 requests possible

Advantage: Handles bursts, memory efficient

5. Leaky Bucket

Requests are processed from the bucket at a constant rate.

flowchart LR
    In["Inflow<br/>(variable)"] --> Bucket["Bucket<br/>(Queue)"] --> Out["Outflow<br/>(fixed rate)"]

Advantage: Stable output rate Disadvantage: Doesn’t handle bursts well

Implementation Patterns

Response Headers

HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1640000000

Response When Limit Exceeded

HTTP/1.1 429 Too Many Requests
Retry-After: 30
Content-Type: application/json

{
  "error": "rate_limit_exceeded",
  "message": "Rate limit exceeded. Please retry after 30 seconds.",
  "retry_after": 30
}

Redis Implementation Example

async function checkRateLimit(userId, limit, windowSec) {
  const key = `ratelimit:${userId}`;
  const current = await redis.incr(key);

  if (current === 1) {
    await redis.expire(key, windowSec);
  }

  if (current > limit) {
    const ttl = await redis.ttl(key);
    return { allowed: false, retryAfter: ttl };
  }

  return { allowed: true, remaining: limit - current };
}

Rate Limit Granularity

User-based

User	Limit
User A	100 requests/minute
User B	100 requests/minute

IP Address-based

IP Address	Limit
192.168.1.1	100 requests/minute
192.168.1.2	100 requests/minute

Endpoint-based

Endpoint	Limit	Note
GET /api/users	100 requests/minute
POST /api/users	10 requests/minute	Stricter for creation

Tiered

Tier	Limit
Free	100 requests/day
Pro	10,000 requests/day
Enterprise	Unlimited

Considerations for Distributed Systems

Centralized

flowchart LR
    S1["Server 1"] --> Redis["Redis<br/>(shared counter)"]
    S2["Server 2"] --> Redis
    S3["Server 3"] --> Redis

Advantage: Accurate Disadvantage: Latency to Redis

Local Cache + Sync

flowchart LR
    S1["Server 1<br/>[Local counter]"] <-->|"Periodic sync"| S2["Server 2<br/>[Local counter]"]

Advantage: Low latency Disadvantage: Tolerates some overrun

Client-side Handling

Exponential Backoff

async function fetchWithRetry(url, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    const response = await fetch(url);

    if (response.status === 429) {
      const retryAfter = response.headers.get('Retry-After') || Math.pow(2, i);
      await sleep(retryAfter * 1000);
      continue;
    }

    return response;
  }
  throw new Error('Rate limit exceeded after retries');
}

Summary

Rate limiting is an important mechanism for ensuring API stability and fairness. By selecting appropriate algorithms like token bucket or sliding window for your use case and setting limits at appropriate granularity, you can protect services while providing a good user experience.

← Back to list