Reliability 3 min read

Capacity Planning Before Black Friday: A Load Testing Playbook

Table of contents

    Every autumn a familiar engagement lands on our desk: "We expect three times last year's Black Friday traffic. Will we survive?" The honest answer is always the same - nobody knows until you test, and the test is only as good as the traffic model behind it.

    This is the playbook we run with clients, refined over several peak seasons. It starts two months before the date, not two weeks.

    Step 1: Build the load model from production, not from guesses

    The most common capacity-planning failure is testing the wrong traffic. Uniform synthetic load against the home page tells you almost nothing, because real peak traffic is shaped: a burst of product-page reads, a deep funnel of cart and checkout writes, and background API traffic from mobile apps that does not slow down when humans do.

    We derive the model from production access logs of the previous peak:

    • Endpoint mix by percentage of requests, split into read and write paths
    • Session flow probabilities (what fraction of product views become cart adds)
    • Concurrency shape - peak arrival rate and how steep the ramp is
    • Cache hit ratios, because a cold cache at peak is a different system

    Then we scale the arrival rate, not the user count. Real Black Friday load is an open workload: customers keep arriving regardless of how slowly you respond. Closed-workload tests, where each virtual user politely waits for a response before sending the next request, flatter your system precisely when it starts to struggle. This single modeling mistake is why teams pass their load tests and still fall over on the day.

    Step 2: Test to failure, not to target

    A test that passes at the target load tells you that you can survive the forecast. It does not tell you what happens when the forecast is wrong - and the forecast is always wrong.

    We run three test tiers:

    1. Target load - the traffic forecast. Must pass with latency SLOs intact.
    2. Headroom load - 2x target. Degradation acceptable, availability intact.
    3. Break test - ramp until something fails, and write down what fails first.

    The break test is the valuable one. Knowing that the connection pool saturates before the CPU does, or that the payment provider's rate limit arrives before your database blinks, converts a mystery outage into a known bottleneck with a runbook.

    Step 3: Rehearse the degradation ladder

    Surviving peak is not about serving every feature at full fidelity. It is about choosing, in advance, what to turn off and in what order. With each client we write a degradation ladder - a ranked list of feature flags:

    1. Disable recommendation widgets (saves the ML inference tier)
    2. Serve category pages from a longer cache TTL
    3. Queue non-critical writes (reviews, wishlists) for later processing
    4. Static fallback for the home page

    Each rung is tested during the load test, not invented during the incident. The rule: every rung must be flippable by the on-call engineer alone, without a deploy, and each rung has a written trigger condition so nobody debates thresholds at 9 a.m. on the day.

    Step 4: The game day

    Two weeks before peak we run the full exercise against production capacity: target load, then headroom load, with the actual on-call team responding to the alerts that fire. The load test validates the system; the game day validates the people. It reliably finds the expired dashboard link, the alert routed to a channel nobody reads, and the one engineer who never got production access.

    2xminimum headroom over forecast
    8weeks of lead time for the full playbook
    0peak-day outages across recent client seasons
    The playbook in numbers.

    Peak readiness is not a load test. It is a traffic model you believe, a failure point you have already met, and a degradation plan your on-call team has already practiced.

    Copied