Work in progress limits are usually well understood and poorly implemented. Teams adopt them, hit them on day 12 (or 6, or 2), and raise them again — which defeats the point entirely. Within a few weeks the system goes back to reflecting reality again instead of constraining it, and the experiment gets filed under "didn’t work for us."
But a limit that fills up immediately isn't evidence that the approach is wrong. It's actually the approach working as intended by exposing a bottleneck that was already there. The problem is knowing what to do at that moment instead of reaching for a higher number.
This guide covers how to set WIP limits that reflect actual capacity, what to do when they're breached, and the failure modes that kill most implementations before they take hold.
Before the practice, the principle.
Little's Law is a theorem from queuing theory that connects three variables in any stable workflow system:
Cycle Time = WIP ÷ Throughput
Cycle time is the average time a work item takes to go from started to done. Throughput is how many items the team completes per unit of time. WIP is how many items are in progress simultaneously. The relationship is direct: if throughput holds steady, the only lever that shortens cycle time is reducing WIP.
Here's a concrete example:
A team completes 10 items per week with 20 in progress. Average cycle time: 2 weeks. Hold throughput constant, cut WIP to 10, and cycle time drops to 1 week — same team, same velocity, half the wait.
That leaves two levers for improving cycle time: increase throughput, or reduce WIP. Historically, increasing throughput meant hiring people or investing in tooling — slow and expensive. AI coding tools may seem to change that equation: if AI accelerates individual output without headcount, maybe throughput goes up cheaply, and WIP limits matter less?
But the 2024 DORA State of DevOps Report showed that a 25% increase in AI adoption was associated with a decrease in delivery throughput and delivery stability at the team level — even as individual productivity metrics improved. The bottlenecks downstream — review queues, testing, cross-team handoffs — don't move at the same rate as code generation.
This is the intuition most managers have backwards. When delivery slows, the instinct is to start more things. Little's Law says otherwise: more things in flight means longer cycle time per item, which means fewer things finish in any given window. Like a highway at rush hour — adding cars doesn't increase throughput, it creates congestion. AI just adds cars faster.
WIP limits are worth the trial and error of implementing them for exactly this reason. They're not a process ritual. They're a response to a mathematical relationship that holds whether your team is shipping manually or generating code with AI assistance.
There's a second mechanism beyond the math. Developers switching between tasks don't switch at zero cost. Research by Dr. Gloria Mark at UC Irvine found that it takes an average of 23 minutes and 15 seconds for the brain to fully rebuild context and regain focus after an interruption.
For developers, the cost is higher than for most knowledge workers — not because they switch more often, but because the mental state they're rebuilding is more complex. The system context, the logic in progress, the decision thread — all of it has to be reconstructed before the work can resume.
High WIP forces this constantly. An engineer with four active items in progress isn't working on four things simultaneously — they're context-switching between them, paying the reconstruction penalty each time. WIP limits reduce the frequency of those switches. The focus benefit isn't soft. It's the difference between 45 minutes of actual coding in a two-hour block and two solid hours.
There's no universal formula for WIP limits. These are starting points, but they’ll likely require iteration.
The most common approach: set each in-progress stage to team size plus one. Five-person team, limit of six. The buffer accommodates a blocked item without leaving someone idle — when one ticket is stuck waiting on a review or a dependency, there's room to pull one more without blowing up the system.
For larger teams, a +2 or +3 buffer is reasonable. The principle holds: one item per person as the baseline, with a small buffer for blockers.
If your team does pair or mob programming on complex work — platform migrations, architecture refactors, etc. — set WIP to the number of pairs, not individuals. One item per pair. This approach produces meaningfully shorter cycle times on the work that benefits from it, because two people focused on one problem move faster than two people each managing their own.
Start higher than you think is right. Then drive the number down on purpose.
The goal of a WIP limit program at the outset is to make constraints visible. Setting limits too aggressively on day one creates resistance before the team has built the habit of managing flow. Start permissive, make the system work, then reduce limits deliberately — each reduction will surface a new bottleneck.
The PMI's Disciplined Agile guidance frames this well: the more mature a workflow system becomes, the lower its WIP limit can be. A lower limit on a mature team is a sign of health, not constraint.
This is the single most common reason WIP limit programs fail to reflect reality. Production support rotations, incident response, architecture reviews, cross-team meetings, migration spikes — if an engineer is doing it, it consumes capacity. A limit that doesn't account for this — and a system that doesn’t measure it — is only capturing a fraction of actual WIP.
The fix: either get this work on the board explicitly, or build it into your baseline limit by calculating available capacity before setting limits. A team of eight engineers where two are always on call isn't a team of eight for planning purposes.
This is the inflection point where most WIP programs succeed or fail.
When a development phase hits its limit, the instinct is to raise the limit "just this once." Especially when people are idle — it feels wasteful to have an engineer waiting when there's clearly work to do. But raising the limit at this point defeats the purpose. The limit filled up because work is moving through that stage more slowly than it's arriving. More cars on the highway won’t fix that.
The right response is to stop pulling new work in and address the constraint.
Practically:
If review is the bottleneck, developers should be doing reviews before pulling new features. Review is part of the job, not a task that belongs exclusively to designated reviewers.
If testing is the bottleneck, look at whether test environments are a constraint, whether tests are being written concurrently with code, and whether the team has established clear "done" criteria that don't create late-stage rework.
If a cross-team dependency is the bottleneck, make the blocked item visible as blocked — with a note on what's needed and from whom — and escalate. Parking the item and starting something else hides the dependency and delays resolution.
The underlying practice: think right-to-left. The priority is always to move what's closest to done across the finish line before pulling something new in. Idle time is uncomfortable but cheap. Unfinished work accumulating in a stage is expensive — it delays everything behind it.
DORA research is direct on this: when people are idle, the right move is to look upstream or downstream for where the constraint is and work on that. The point of WIP limits is to expose problems so they can be fixed — not to manufacture busy work to fill the gap.
These are the common organizational dynamics that might sabotage your WIP program before it even begins.
Raising the debt ceiling. Teams hit the limit. They raise it. They hit it again. They raise it again. Eventually, the limits are set high enough that they're never triggered, and WIP tracking becomes a reporting exercise rather than a constraint. If you notice yourself raising a limit because the team keeps hitting it, that's a bottleneck asking to be addressed.
Background tasks. Engineers add standing items — "tech debt cleanup," "documentation sprint," "exploratory spike" — to fill the calendar when the real work is blocked. This looks productive, but it masks the bottleneck that created the idle time. The background task absorbs the signal rather than forcing the team to deal with what's causing it.
Global limits without phase specificity. Setting a single WIP limit across the entire workflow means you lose the diagnostic value. A review bottleneck and a testing environment constraint look identical at the global level — both just show up as "too much in progress."
Individual utilization over team flow. This is the hardest cultural shift. WIP limits optimize for what the team ships, not for whether every individual is busy at all times. A fully occupied team with high WIP often delivers less than a team where some people are temporarily idle while clearing a bottleneck. The metric that matters is throughput and cycle time, not seat utilization.
Three signals tell you whether WIP limits are changing how work actually flows.
The most direct is cycle time. If the average time from started to done is decreasing over four to six weeks, the limits are working. A flat or rising cycle time while the team feels busy usually means WIP is still too high, or invisible work is going uncounted.
WIP age is the leading one. Most workflow tools can surface this with aging views or item timestamps. Anything stationary for more than a day or two in an in-progress phase deserves attention before it shows up in cycle time data.
Consistency in PR velocity is the third signal, and the one most directly tied to stakeholder credibility. Erratic completions — sprint where the team ships twelve items, followed by a sprint where they ship three — is a sign of high WIP and poor flow. Consistent throughput is the precursor to predictability, and predictability is what makes commitments to stakeholders credible.
One practical note on data: organize by start dates, not finish dates, when analyzing the effects of a WIP limit change. The impact on cycle time will lag by however long items were already in progress when you made the change. Teams that measure by completions often conclude that limits "didn't help" because they're looking at items that entered the system before the change.
WIP limits are diagnostic before they're prescriptive. Their job is to make dysfunction visible — to force the question of why work is accumulating in a particular stage rather than letting it go unnoticed behind a wall of "in progress" tickets.
Teams that can see where work stalls, understand the deeper context, and adopt new behaviors consistently are the ones that actually improve throughput over time, rather than adjusting limits and hoping.