Software Development Analytics: What Your Metrics Are Missing

Written by Lauren Lang | Jun 25, 2026 11:59:33 PM

Most engineering leaders evaluating software development analytics platforms are solving for the same thing: they have data, but it isn't telling them anything useful. Deployment frequency is down. Cycle time is up. The dashboard confirms what the last three retros already surfaced.

Often the working assumption is that better software engineering analytics — different data, more signals, direct feeds, better visualizations, faster reporting — will close that gap.

But the data only reflects the model used to collect it. In order to collect data that can drive decisions, you may need a different measurement model all together.

Our 2025 survey found that 66% of engineering leaders track high-level business metrics tied to engineering work. Most of them also track granular developer-level output. But the organizational layer between individual activity and business outcomes goes unmeasured — and the analytics platforms most leaders use were built to answer a different question than the one actually on the table.

That's the model problem.

Why the individual-output model fails

The dominant model of software development analytics treats engineering productivity as primarily a function of individual engineers producing work. It counts what engineers produce — lines of code, tickets, deployments — and uses volume and velocity as proxies for performance.

Consider two teams with similar throughput numbers: PRs merging slowly, cycle times stretched, tickets taking longer than planned. One team is working through genuinely complex technical problems — careful, deliberate work that takes time because the problems are hard. The other is thrashing — absorbing constant requirements changes mid-sprint, blocked by upstream dependencies, context-switching across too many concurrent priorities.

Both teams look like they need to go faster. One needs space to do hard work well. The other needs the organizational constraints around them addressed first. Individual-output metrics can't distinguish those two situations — and when they can't, they can't tell you what to fix.

Frameworks like DORA emerged as attempts to move beyond individual output. DORA metrics measure the output of the pipeline with precision and tell you when something in delivery has shifted — but they surface the downstream effects of systemic bottlenecks, not the conditions producing them. Individual metrics like commits, PR reviews, and now AI token spend are trying to get closer to root causes and leading indicators.

The problem is the layer they're measuring: individual activity is a proxy for system performance, and a noisy one. The model embedded in the software determines what questions the data can answer. A platform built on individual-output assumptions will surface individual-output signals, regardless of how many integrations it has or how sophisticated the visualizations are.

What a systems-level model measures

A systems-level model treats engineering output as a function of the delivery system — the full chain from requirements to deployment — and measures the conditions that determine whether engineering effort produces outcomes.

Engineering effectiveness research has found that most enterprises use less than a third of their engineering capacity. That figure points at a specific question: where is the other two-thirds going, and what's consuming it?

Here are four organizational patterns (of many) that may provide an answer:

Capacity that lands somewhere other than the roadmap. Uplevel's research puts average time on new value creation at under 20% — one day in five. The rest goes to maintenance, unplanned work, and ad-hoc requests. Output metrics miss this because this work still closes tickets. Tracking actual time allocation against planned priorities catches it — and typically reveals that the delivery problem engineering leaders are trying to solve is actually a planning problem.
Focus time that gets fragmented before the work can compound. For engineers working on complex problems, context-switching across too many concurrent priorities produces a measurable pattern: shorter deep work blocks, higher defect rates, longer completion times on work that should be tractable. That pattern shows up in focus time data before it surfaces in cycle time.
Unstable requirements. Planning instability forces rework that looks, in velocity metrics, like low throughput. The team is producing — it's producing work that gets thrown away or rebuilt. Sprint completion rates and requirements churn track this at the planning layer, upstream of where the cost shows up in delivery data.
Pipeline friction that slows work after the code is written. Manual QA handoffs, review bottlenecks, cross-team dependencies, and fragile CI/CD infrastructure add wait time that compounds across every PR and every release. Process friction in the pipeline itself requires different interventions than anything upstream in the development cycle.

These patterns sit behind most of the delivery problems that output and pipeline metrics — PR cycle time, PR velocity, issue velocity, DORA's four measures, bug rate — surface without the context to diagnose. The systems-level model measures both layers: the conditions producing the patterns, and the delivery signals that show when the patterns have become delivery problems.

	Individual-output model	Systems-level model
What it counts	Commits, PRs merged, tickets closed, story points, AI token spend	Effort allocation, focus time, requirements churn, pipeline friction, environment health
Question it answers	How much did engineering produce?	Why did the system produce that — and what would change it?
What it misses	The organizational conditions that determine whether output translates to outcomes	Nothing inherent — includes output metrics as the delivery layer
When it fails	When the constraint is upstream of the code: planning, requirements, dependencies, focus	When the organization lacks the context to interpret what the patterns mean

AI is making engineering analytics harder

AI compounds both failures of the individual-output model at once.

The first failure is measurement accuracy. When AI is generating a significant share of the code, PR volume and commit frequency track AI throughput, not engineering judgment. A developer directing an agent through a complex architectural problem and one accepting autocomplete suggestions on routine work produce identical output metrics. The signal is decoupled from the work.

The second failure is amplification of what’s already broken. Writing and testing code is only 25–35% of the idea-to-launch timeline — accelerating it alone doesn't move time-to-market much, if at all. AI adoption correlates with higher throughput and higher instability simultaneously. The systemic problems the individual-output model was already blind to — unstable requirements, pipeline fragility, misallocated capacity — don't magically get resolved by AI adoption. They just get faster.

An analytics platform still built on individual-output assumptions will show activity metrics improving while both failures compound underneath. What’s actually needed is a systems-level model that catches them before they show up as missed commitments or degraded output.

How Uplevel approaches software development analytics

Engineering leaders who work with Uplevel arrive with a clear picture of their pipeline metrics and a much murkier picture of what's underneath. DevEx Discovery™ interviews surface the qualitative context that explains why quantitative patterns look the way they do — what's causing planning churn, where dependencies are blocking teams, why deep work time is declining in a specific group. Uplevel organizes those dimensions into a picture with enough context to act on.

Even systems-level measurement infrastructure alone doesn't produce change — any picture of a complex organization still requires contextual understanding and organizational work to act on. That's another gap most analytics implementations leave open. Uplevel adds executive clarity sessions, team solutioning workshops, and monthly insights readouts to build the organizational capacity to sustain improvement over time.

If you want to understand where your organization stands before committing to a platform, StackUp is a free assessment that surfaces the system-level patterns individual metrics tend to obscure.

Start with StackUp →

Frequently Asked Questions

View full post