Building Smarter Rate Limits in NestJS with Redis

When you build APIs that bill per token—like AI workloads—rate limiting stops being just a traffic control feature.

It becomes a revenue-protection mechanism.

We learned this the hard way: if you let users run multiple concurrent AI tasks before their token usage is reconciled, you can lose real money.

So we started from NestJS’s built-in throttler, explored Redis-based options, and eventually built our own token-bucket limiter with Lua.

This post walks through that decision process—what works, what doesn’t, and how to evolve your rate limiting when you move from simple backend requests to token-based billing.

1. Starting Point: NestJS Throttler

NestJS ships with a throttler module:


npm install @nestjs/throttler

It’s simple to set up:


ThrottlerModule.forRoot({

ttl: 60, // seconds

limit: 10 // max requests per TTL window

});

Behind the scenes, the ThrottlerGuard intercepts requests and counts how many times a key (like IP:route) appears in a local in-memory map.

How it works internally

Each request pushes a timestamp into an array.
On each hit, it removes timestamps older than Date.now() - ttl.
If array.length > limit, it throws TooManyRequestsException().
Old entries expire automatically via setTimeout.

It’s a fixed-window counter—fast, but not distributed. Each NestJS instance has its own counters.

The problem

If you scale horizontally, every instance has its own throttling state.

A single user hitting multiple instances can easily bypass the limit.

For example:

Instance A: 10 requests
Instance B: 10 requests
Combined: 20 requests (limit intended was 10)

This works fine for small deployments, but fails for multi-node APIs.

2. Making It Distributed: Redis Storage

To synchronize rate limits across instances, NestJS supports pluggable storage backends.

You can install the Redis storage adapter:


npm install @nestjs/throttler-storage-redis ioredis

and update your module:


import { ThrottlerStorageRedisService } from '@nestjs/throttler-storage-redis';

  

ThrottlerModule.forRoot({

ttl: 60,

limit: 10,

storage: new ThrottlerStorageRedisService({

host: 'localhost',

port: 6379,

}),

});

How this version works

Internally, it uses Redis sorted sets (ZSET) and commands like:


ZREMRANGEBYSCORE key 0 (now - ttl)

ZADD key now now

EXPIRE key ttl

ZCARD key

That turns the throttler into a sliding-window limiter:

Each timestamp is recorded in Redis.
Old entries fall off automatically as their scores expire.
The counter is shared across all app instances.

Distributed, smoother than fixed-window, but still request-based rather than cost-based.

When it’s good enough

This Redis-backed throttler is perfect if you:

Only care about requests per second/minute
Don’t need per-tier token limits
Want plug-and-play scaling across multiple app instances

But if you’re charging users for token usage, not just requests, it’s not sufficient.

3. Why Request Limits Weren’t Enough for AI Workloads

Our use case: users trigger AI tasks that consume tokens.

A “request” can mean anywhere from 100 to 200,000 tokens.

That means:

A user sending 100 small tasks is fine.
A user sending 3 giant prompts could blow their budget instantly.

We needed rate limiting by token cost, not just request count. And we needed it atomic, tier-aware, and distributed. The NestJS throttler can’t calculate token cost per request. We could extend ThrottlerGuard, but it still lacks atomic safety under concurrency.

That’s when we moved to Redis + Lua.

4. Token Bucket with Lua (Tier-Aware and Atomic)

The token-bucket algorithm gives a smooth, fair way to rate-limit while allowing bursts.

Each user has a bucket of tokens that refills at a steady rate.

Each request consumes tokens equal to its cost.

When the bucket’s empty, new requests are rejected until refill.

The Lua Script


-- KEYS[1] = user key, e.g. "rate:{123}"

-- ARGV[1] = capacity (max tokens)

-- ARGV[2] = fill_rate_per_ms

-- ARGV[3] = now_ms

-- ARGV[4] = cost (tokens needed)

  

local key = KEYS[1]

local capacity = tonumber(ARGV[1])

local fill_rate = tonumber(ARGV[2])

local now = tonumber(ARGV[3])

local cost = tonumber(ARGV[4])

  

local data = redis.call("HMGET", key, "tokens", "ts")

local tokens = tonumber(data[1])

local ts = tonumber(data[2])

  

if tokens == nil then tokens = capacity end

if ts == nil then ts = now end

  

local delta = now - ts

if delta < 0 then delta = 0 end

tokens = math.min(capacity, tokens + (delta * fill_rate))

  

local allowed = 0

local retry_after_ms = 0

  

if tokens >= cost then

tokens = tokens - cost

allowed = 1

else

retry_after_ms = math.ceil((cost - tokens) / fill_rate)

end

  

redis.call("HMSET", key, "tokens", tokens, "ts", now)

redis.call("PEXPIRE", key, math.ceil(capacity / fill_rate))

  

return { allowed, tokens, retry_after_ms }

This script:

Refills tokens based on elapsed time.
Deducts cost atomically.
Returns remaining tokens and retry delay.

No race conditions, even under heavy concurrency.

5. Integrating the Lua Bucket in NestJS

Load it dynamically with ioredis:


import IORedis from 'ioredis';

  

export class RateLimiter {

private sha!: string;

constructor(private redis: IORedis, private lua: string) {}

  

async init() {

this.sha = await this.redis.script('LOAD', this.lua);

}

  

async checkTokens(userId: string, capacity: number, fillRate: number, cost: number) {

const key = `rate:{${userId}}`;

const now = Date.now();

const res = await this.redis.evalsha(this.sha, 1, key, capacity, fillRate, now, cost);

const [allowed, remaining, retryAfter] = (res as any[]).map(Number);

return { allowed: !!allowed, remaining, retryAfter };

}

}

Then wrap it in a NestJS interceptor:


@Injectable()

export class TokenRateLimitInterceptor implements NestInterceptor {

constructor(private limiter: RateLimiter, private tiers: TierService) {}

  

async intercept(ctx: ExecutionContext, next: CallHandler) {

const req = ctx.switchToHttp().getRequest();

const user = req.user;

if (!user) return next.handle();

  

const tier = this.tiers.get(user.tier);

const capacity = tier.burstTokens;

const fillRate = tier.tokensPerMinute / 60000;

const cost = req.tokenCost ?? 1000;

  

const { allowed, retryAfter } = await this.limiter.checkTokens(

user.id, capacity, fillRate, cost

);

  

if (!allowed) {

throw new TooManyRequestsException({

message: 'Token limit exceeded',

retry_after_ms: retryAfter,

});

}

  

return next.handle();

}

}

Use hash-tagged keys (rate:{userId}) so Redis Cluster routes all per-user keys to the same slot.

6. Real-World Examples

Several open-source projects use similar Lua-based logic:

BitMEX/node-redis-token-bucket-ratelimiter – Token bucket in Lua + Redis for Node.js.
WeTransfer/Prorate – Leaky bucket using Redis TIME for consistency.
garana/o1.rate-limiter – Sliding window (ZSET-based) Lua limiter.
Losant/redis-gcra – GCRA variant for smooth rate limiting.
Recruitee/plug_limit – Elixir Plug using Redis Lua scripts.

All rely on Redis atomic operations—no modules, no race conditions.

7. Putting It All Together

|———–|————|——–|———-|————-|—————-|

8. Takeaways

The NestJS throttler is easy to use but local to a single process.
Using Redis storage turns it into a distributed sliding-window limiter.
For workloads billed by token or requiring atomic accuracy, a Lua token-bucket limiter is safer and more flexible.
Redis + Lua gives you fast, atomic, and fully distributed enforcement without modifying Redis or adding modules.
These techniques are proven in production by teams like BitMEX, WeTransfer, and Losant.

If you’re building APIs where each request can have wildly different computational costs, a token-bucket limiter is your best line of defense between predictable performance and unexpected loss.

Originally published on ofeng.org.