API Design 3 min read

Rate Limiting Strategies for Multi-Tenant SaaS Platforms

Table of contents

    In consumer systems, rate limiting is a bouncer: its job is to keep abusers out. In multi-tenant B2B, the clients hitting your limits are paying customers with signed contracts, and the real job is fairness - making sure one tenant's enthusiasm is not another tenant's outage. That reframing changes almost every design decision.

    Here is how we approach it on the APIs behind KID and FormOK, and in client platforms we have designed.

    Token buckets, because bursts are legitimate

    Enterprise traffic is bursty by nature: a nightly sync pushes a day of records in minutes, a rebuilt cache refetches everything at 9 a.m. A fixed per-second ceiling punishes these normal patterns and generates support tickets from customers doing nothing wrong.

    A token bucket separates the two questions that matter - sustained rate (refill) and burst tolerance (bucket size) - and lets you set them independently per plan tier. We publish both numbers. "600 requests per minute, bursts up to 100" is a contract an integration engineer can design against; a bare "600 rpm" is an invitation to discover the burst behavior experimentally, in production, at your expense.

    Isolation first, then quota

    The subtle failure mode in multi-tenant platforms is that the limiter itself becomes a shared resource. If all tenants drain one pool - one overloaded limiter store, one saturated worker fleet behind it - a single tenant's burst degrades everyone, and your rate limiter has reproduced the noisy-neighbor problem it existed to solve.

    Our layering:

    1. Platform guardrail - a global ceiling protecting infrastructure, set well above the sum of expected traffic. Tripping it is an incident, not a policy.
    2. Per-tenant quota - the contractual limit, enforced per credential.
    3. Per-tenant concurrency cap - limits on in-flight requests, which protect shared workers from slow-request pileups that pure rate limits miss.

    The concurrency cap is the one teams skip and later wish they had not. Ten requests per second is harmless when each takes 50 milliseconds and devastating when each takes 30 seconds.

    A 429 is an API response, deserve the same care

    The difference between a rate limit that generates outages and one that generates well-behaved clients is mostly communication:

    HTTP/1.1 429 Too Many Requests
    Retry-After: 12
    RateLimit-Limit: 600
    RateLimit-Remaining: 0
    RateLimit-Reset: 12

    Two rules we enforce. First, headers appear on every response, not only the 429 - clients that can see RateLimit-Remaining falling implement backpressure before the failure, and the ones that ignore it at least fail with the information present. Second, Retry-After is honest: it reflects actual refill, plus jitter guidance in the docs, so a fleet of clients does not synchronize into a retry stampede at second zero.

    And read operations should never share a bucket with writes. A tenant paging through history must not starve their own webhook deliveries.

    Limits are a product surface

    The technical design is half the work; the other half is organizational. Sales will ask for exceptions, and the worst possible process is a Slack message to an engineer who edits a config. We build tenant-level overrides into the admin plane from day one - bounded, logged, time-boxed where appropriate - so a limit increase is a product operation with an audit trail, not a favor.

    The health metric we watch is not 429 volume but sustained throttling: a tenant bumping the ceiling for days is either an integration bug we should flag to them or a customer who has outgrown their tier. Both deserve outreach before they become churn. A tenant who hits a limit, understands it, and knows the upgrade path is a healthy account. A tenant who discovers limits as mysterious production failures is writing your next negative reference call.

    Rate limiting done well is invisible to well-behaved integrations and legible to everyone else. That legibility is the feature.

    Copied