How Rate Limiting Works - Access Control to Protect APIs

14 min read | 2025.12.13

What is Rate Limiting

Rate Limiting is a mechanism that limits the number of API requests within a certain time period. It maintains service stability and protects systems from malicious use or excessive access due to bugs.

Why it’s needed: Without request limits, a single user could exhaust all system resources, or DDoS attacks could bring down services.

Purposes of Rate Limiting

PurposeDescription
Service protectionPrevent downtime from overload
FairnessDistribute resources fairly to all users
Abuse preventionDeter scraping, brute force attacks
Cost managementEnsure infrastructure cost predictability

Major Algorithms

1. Fixed Window

Resets the count at each fixed time window.

TimeRequestsStatus
00:00-00:3090
00:30-00:5910 (total 100)
01:00Counter reset-
01:00-01:30100

Problem: Allows momentarily double the requests at window boundaries

TimeRequestsProblem
00:59100 ✓
01:00100 ✓200 requests in 2 seconds!

2. Sliding Window Log

Records timestamps of each request and counts requests in the past N seconds.

Current time: 01:00:30, Window: Past 60 seconds (00:00:30-01:00:30)

TimestampStatus
00:00:25Outside window (delete)
00:00:35✓ In window
00:00:50✓ In window
01:00:10✓ In window

Advantage: Accurate rate limiting Disadvantage: High memory usage

3. Sliding Window Counter

Improved version of fixed window. Calculates using weighted counts from previous and current windows.

WindowRequests
Previous (00:00-00:59)80
Current (01:00-01:59)30
Current time01:00:20 (33% into window)

Estimated requests = 80 × 0.67 + 30 = 83.6

4. Token Bucket

Tokens are added to a bucket at a constant rate, and each request consumes a token.

Configuration: Bucket capacity: 10 tokens, Refill rate: 1 token/second

StateTokensDescription
Initial10/10Full bucket
After 5 requests5/105 tokens consumed
After 3 seconds8/103 tokens refilled
Burst capacity88 requests possible

Advantage: Handles bursts, memory efficient

5. Leaky Bucket

Requests are processed from the bucket at a constant rate.

flowchart LR
    In["Inflow<br/>(variable)"] --> Bucket["Bucket<br/>(Queue)"] --> Out["Outflow<br/>(fixed rate)"]

Advantage: Stable output rate Disadvantage: Doesn’t handle bursts well

Implementation Patterns

Response Headers

HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1640000000

Response When Limit Exceeded

HTTP/1.1 429 Too Many Requests
Retry-After: 30
Content-Type: application/json

{
  "error": "rate_limit_exceeded",
  "message": "Rate limit exceeded. Please retry after 30 seconds.",
  "retry_after": 30
}

Redis Implementation Example

async function checkRateLimit(userId, limit, windowSec) {
  const key = `ratelimit:${userId}`;
  const current = await redis.incr(key);

  if (current === 1) {
    await redis.expire(key, windowSec);
  }

  if (current > limit) {
    const ttl = await redis.ttl(key);
    return { allowed: false, retryAfter: ttl };
  }

  return { allowed: true, remaining: limit - current };
}

Rate Limit Granularity

User-based

UserLimit
User A100 requests/minute
User B100 requests/minute

IP Address-based

IP AddressLimit
192.168.1.1100 requests/minute
192.168.1.2100 requests/minute

Endpoint-based

EndpointLimitNote
GET /api/users100 requests/minute
POST /api/users10 requests/minuteStricter for creation

Tiered

TierLimit
Free100 requests/day
Pro10,000 requests/day
EnterpriseUnlimited

Considerations for Distributed Systems

Centralized

flowchart LR
    S1["Server 1"] --> Redis["Redis<br/>(shared counter)"]
    S2["Server 2"] --> Redis
    S3["Server 3"] --> Redis

Advantage: Accurate Disadvantage: Latency to Redis

Local Cache + Sync

flowchart LR
    S1["Server 1<br/>[Local counter]"] <-->|"Periodic sync"| S2["Server 2<br/>[Local counter]"]

Advantage: Low latency Disadvantage: Tolerates some overrun

Client-side Handling

Exponential Backoff

async function fetchWithRetry(url, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    const response = await fetch(url);

    if (response.status === 429) {
      const retryAfter = response.headers.get('Retry-After') || Math.pow(2, i);
      await sleep(retryAfter * 1000);
      continue;
    }

    return response;
  }
  throw new Error('Rate limit exceeded after retries');
}

Summary

Rate limiting is an important mechanism for ensuring API stability and fairness. By selecting appropriate algorithms like token bucket or sliding window for your use case and setting limits at appropriate granularity, you can protect services while providing a good user experience.

← Back to list