How to Find Engineering Bottlenecks

Written by Lauren Lang | Oct 21, 2025 6:30:25 PM

Most engineering leaders know the feeling — teams are busy but delivery still drags. In large or complex organizations, the root cause might not be easy to find. Stack Overflow's 2024 survey found that more than half of developers, for example, feel slowed by waiting for information. These aren't just technical hiccups; they're signs of deeper, systemic bottlenecks.

Despite all the dashboards, most organizations still struggle to pinpoint what's actually slowing them down. McKinsey differentiates developer work between the "inner loop" (core work), where company leadership wants their devs 70% of the time, and the reality of "outer loop" tasks like dependency wrangling, integration, and setup, which is the majority of engineering work.

The gap between what's measured and what matters is real — and identifying that gap is what separates high-performing organizations from the rest.

But first, it's important to recognize that technical-only diagnosis has limits.

It's easy to focus on what's visible, like code output, PR review times, and deployment frequency. But these metrics only tell part of the story. This is especially important to remember when it comes to AI, which tends to amplify what already exists. High-performing teams get faster, while dysfunctional ones get more chaotic. If you're only looking at technical metrics, you'll miss the real constraints.

What is a sociotechnical system?

Modern engineering organizations are classic sociotechnical systems. For example, a slow CI/CD pipeline might be a technical bottleneck, but work can also stall because teams rely solely on Jira tickets, leading to miscommunication and rework. DORA's research shows that bottlenecks often span both technical and organizational boundaries.

Symptoms vs. root causes

Missed deadlines, slow delivery, high bug rates, and frustrated teams are symptoms, not causes. The real challenge is "dashboard blindness" — confusing what's easy to see with what's actually important. Nearly half of platform teams don't measure success at all, and a quarter collect data but never analyze it. Most teams are stuck treating symptoms.

For example, high bug rates might show up in your metrics, but the root cause could be constant interruptions or rushed reviews. Slow delivery might look like a tooling problem, but often it's about unclear requirements or fragmented workflows.

Symptom	Possible Root Causes
Missed deadlines	Unclear requirements, hidden dependencies
Slow delivery	Excessive handoffs, fragmented tooling, context switching
High bug rates	Rushed reviews, lack of protected focus time
Rework	Poor requirement validation, misaligned stakeholders
Low morale	Reactive work culture, lack of autonomy

Tools that surface engineering bottlenecks

Engineering intelligence platforms have become essential for gaining visibility into where work gets stuck. They provide system-level metrics on cycle time, deployment frequency, PR review patterns, and developer time allocation.

Surfacing patterns that would otherwise remain invisible, engineering intelligence platforms answer questions like: Where is work waiting? Which teams have the longest cycle times? How much time do developers spend in meetings versus focused work?

But here's what most can't do: tell you why those patterns exist or what to do about them.

No dashboard can reveal that your cycle time is slow because product requirements are ambiguous, or that your bug rate is high because junior engineers are afraid to push code frequently. Software shows you the symptoms; humans diagnose the disease.

This is where Uplevel takes a different approach. Uplevel's WAVE Framework helps managers dig deeper by combining hard data with team context — real diagnosis requires it. We then pair that visibility with hands-on change enablement through the Uplevel Method.

Rather than leaving leaders to interpret dashboards alone, Uplevel's team works alongside engineering leadership to ask the right questions, involve the right people, and design interventions that address root causes rather than symptoms. The platform provides the data; the Method provides the "why" and the "what to do about it."

The Uplevel approach: holistic bottleneck diagnosis

The WAVE Framework — Ways of Working, Alignment, Velocity, and Environmental Efficiency — offers a practical way to see the whole system.

Ways of Working covers deep work, team health, and AI maturity.
Alignment tracks how well engineering effort maps to business outcomes.
Velocity measures how work moves, including cycle time, handoffs, and PR reviews.
Environment Efficiency looks at system support, code quality, and friction points.

Using WAVE to guide our bottleneck discovery, Uplevel pairs data with change management via the Uplevel Method. Within the Method, we help engineering organizations start with a baseline, focus on one area to begin, and support measurement with real change.

This is especially important as AI adoption grows. Without mature systems, AI just creates isolated pockets of productivity and sometimes more instability.

How to diagnose bottlenecks: (not) a checklist

Diagnosing bottlenecks isn't a linear process. You'll revisit steps as new information and challenges emerge. Treat this as an iterative loop.

1. Sense performance issues

Look for recurring symptoms: missed deadlines, rework, slow delivery, high bug rates, or team frustration. These signals are often subtle or politically sensitive. Building trust is key, since teams need to feel safe surfacing real problems.

Google's Project Aristotle found that psychological safety — the belief that you won't be punished for mistakes — matters more than individual talent in determining team effectiveness. Amy Edmondson's research at Harvard showed that better hospital teams don't make fewer errors; they're simply more willing to report them. The same applies to engineering: your best teams might look worse in metrics if others hide problems. Create multiple sensing channels: short pulse checks, skip-level meetings, and anonymous surveys. The goal isn't perfection but early detection before small issues compound into crises.

2. Gather and interpret data

Collect metrics like cycle time, PR review time, deep work hours, and incident volume. But quantitative data is rarely the whole story. For example, Google's 2025 DORA report shows that while over 80% of developers feel more productive with AI, 30% don't trust AI-generated code, and instability can rise if the system is already shaky. To get the big picture, talk to managers, survey teams, run solutioning workshops, and get stakeholder feedback.

3. Map workflows and value streams

Start by tracing a recent feature from concept to production. Document every handoff: product to engineering, backend to frontend, dev to QA, code complete to deployed. At each transition, note the wait time — not just the active work time. You'll often find that work spends 80% of its time waiting and only 20% being actively worked on. These wait states are your primary optimization targets.

Look for common patterns that signal bottlenecks: work that routinely comes back for clarification (requirements gaps), features that sit "done" but not deployed (release process constraints), PRs waiting days for review (capacity or priority misalignment), and dependencies on specific people rather than teams (knowledge silos). Pay special attention to handoffs between teams with different managers—these organizational boundaries often create the most friction because no single person owns the end-to-end flow.

The uncomfortable part: value stream mapping exposes where accountability is unclear, where teams optimize locally at the expense of the system, and where status incentives conflict with flow. A team might look highly productive by their metrics while creating bottlenecks for everyone downstream. The goal isn't blame—it's making the invisible visible so you can optimize the whole system, not just the parts.

4. Diagnose root causes

Root cause analysis is iterative. Use tools like the "Five Whys" and involve people from across the org. Be ready to find that the real constraint is outside your team — maybe in another group, a legacy process, or even leadership incentives. Adding more process or documentation rarely fixes systemic issues. The goal is understanding the actual constraint, not creating more overhead.

Apply Theory of Constraints thinking: every system has at least one bottleneck limiting throughput, and improving anything else is an illusion. Is your constraint specialized knowledge holders, code review capacity, testing environments, or deployment pipelines? Once identified, focus ruthlessly on exploiting that constraint before adding capacity elsewhere. Cross-organizational bottlenecks are particularly insidious because they're invisible to team-level metrics — value stream mapping makes these visible by showing where work accumulates at boundaries between teams.

5. Prioritize interventions

Weigh impact and effort, but know that you won't always have perfect information. Prioritization is a negotiation—tradeoffs are inevitable. Focus on fixes that address the system, not just one team or metric, and be ready to adjust as you learn.

Don Reinertsen's Cost of Delay framework offers a powerful prioritization tool: if you only quantify one thing, quantify what it costs to delay each initiative by one month. Maersk Line found one feature spent 38 weeks in queues with $200,000 weekly delay cost — $8M in lost revenue from waiting! WSJF (Weighted Shortest Job First) operationalizes this by dividing Cost of Delay by job duration, helping you sequence work to minimize cumulative delay rather than maximizing resource utilization.

For engineering improvements specifically, translate value into concrete terms: automation value equals current manual cost, infrastructure improvements equal reduced incident response time and enhanced team capacity.

6. Implement and measure change

Change rarely lands cleanly. Expect resistance and setbacks. Use both leading (deep work, handoff quality) and lagging (cycle time, bug rate) indicators to track progress. Adjust as you go.

Research shows engineers form attitudes toward change collectively according to team social norms, not individually. This means that traditional change management focused on individual adoption misses the fundamental social dynamics.

Microsoft's SPACE framework emphasizes measuring across five dimensions: Satisfaction, Performance, Activity, Communication, and Efficiency. The critical insight: no single dimension captures the full picture. Leading indicators like developer satisfaction and collaboration quality predict future performance but are harder to measure; lagging indicators like DORA metrics are easy to measure but hard to influence.

Use 3-8 KPIs combining both types, and when quantitative and qualitative data disagree, investigate: the quantitative is often wrong. Implement improvements in small cycles with retrospectives, learning as you go rather than attempting waterfall organizational change.

How real-world organizations find and fix engineering bottlenecks

The diagnostic loop becomes concrete when you see it in action. The following organizations faced different symptoms—invisible friction, unplanned work overload, unclear resource allocation, and infrastructure migration risk—but each used a similar approach: combine data with context, diagnose the actual constraint, and implement targeted interventions. Here's what they learned.

Accolade: from invisible friction to 20% more focus time

Accolade's 300-person engineering team faced a common problem at scale: leaders suspected interruptions and context-switching were slowing delivery, but couldn't pinpoint the impact. Using Uplevel to surface deep work patterns and PR complexity, they discovered a near-perfect correlation between low focus time and disengagement scores from developer surveys.

The diagnosis revealed two root causes: excessive meetings fragmented the day, and junior engineers hesitated to push code frequently enough, creating large, risky PRs.

Accolade implemented frameworks for protecting focus time and training to reduce fear around testing and shipping. Within a year, deep work increased 20% and deployments rose 205%, proving that addressing invisible friction creates measurable outcomes.

Xactly: unplanned work was eating 17% of engineering capacity

Xactly's leadership knew delivery felt slower than it should, but without comprehensive visibility, they couldn't quantify the problem. Standard tools tracked tickets and commits but missed how much time disappeared into Slack conversations and ad hoc meetings.

Uplevel revealed the full picture: a large volume of unplanned work constantly pulled developers away from planned roadmap items. Additionally, data showed that complex PRs were being reviewed in just five minutes — a clear quality risk given the team's high bug rate.

By addressing these systemic issues — rotating incident response to protect focus time, and implementing code review training to improve quality standards — Xactly increased available dev time by 17% without adding headcount. The bug rate dropped, and teams could finally deliver on planned work.

Avalara: measuring before migrating to prove infrastructure ROI

When Avalara's VP of Engineering Matt Buckley evaluated migrating to GitLab as a unified DevSecOps platform, he faced a common challenge: proving the ROI of infrastructure changes before implementation. In an environment processing billions in tax remittance, deployment decisions carried massive liability risk.

Buckley used Uplevel to establish comprehensive baseline metrics before any changes. The data revealed that merge requests were taking weeks to reach production due to toolchain complexity — an optimization opportunity that would have remained invisible without measurement.

After implementing GitLab's shared CI/CD pipelines and containerizing applications, the results were dramatic: deployment frequency improved 1,100%, cycle time dropped from 4 weeks to 3 hours, and throughput increased 2.75x above industry benchmarks. Critically, quality metrics stayed stable throughout the transformation.

The measurement approach converted infrastructure experiments into justified budget decisions. When other teams saw the quantified results, mission-critical systems handling massive financial exposure migrated to the new platform within months. As Buckley notes, "The metrics clearly demonstrated progression" to both engineering and business stakeholders.

The engineering leader's new role: system optimizer

Diagnosing bottlenecks is a leadership job. It means using root cause analysis, combining data with context, and prioritizing interventions that move the business. The biggest wins often cross team and functional boundaries. As AI becomes more embedded in engineering, its impact will depend on the maturity of your systems and culture.

Google's research is clear: system-level visibility and measurement drive real improvement. The real work is sensemaking, facilitation, and adaptive leadership. Use frameworks like WAVE, stay close to the data and the people, and keep the feedback loop open. Bottlenecks are inevitable, but with the right approach, they become opportunities for real improvement.

View full post