AI Capabilities Maturity: What to Measure Beyond Adoption

Written by Lauren Lang | May 19, 2026 4:26:21 PM

Uplevel surveyed over 100 engineering leaders and found that 88% rate their organizations as highly prepared for AI. In the same survey, only 2% had a documented AI strategy.

The fundamental mistake shown here — in this data, but also in the countless organizations replacing devs with Claude — is mistaking confidence for capability. Checking the box on tool rollout doesn’t mean you’ve generated ROI. License utilization, code acceptance rates, and AI-assisted commit volume measure activity. They say almost nothing about whether that activity is generating business value, or what it’s costing you to find out.

That work requires mature AI capabilities.

What is AI capabilities maturity?

AI capabilities maturity is a measure of how well an organization's full system — technical infrastructure, team design, governance, shared context, and people — is positioned to generate compounding returns from AI investment, at the team and org level.

Maturity is dynamic, not a one-time checkpoint. An organization can score high on adoption and low on system health simultaneously — and that combination carries substantial cost, because spend is accumulating while the return is unclear. The six transformation surfaces that determine whether AI compounds or stalls map directly to what a mature measurement program has to cover.

Why usage metrics give you a false read

Usage going up means engineers are using the tools. Whether the tools are working for the organization is a separate question, and it requires different measurement.

Uplevel's AI Measurement Crisis research found that 50% of engineering leaders want business outcomes from AI, but only 3% use business metrics to evaluate engineering performance. The measurement approach doesn't match the stated goal. Leaders are tracking the input — adoption — and hoping the output takes care of itself.

The cost dimension makes this concrete. AI tool spend scales with usage. Organizations running agents at scale are discovering that token costs accumulate fast — some have been advised to roll back agent deployments because the spend was outpacing any measurable return.

DORA’s 2025 State of AI-Assisted Software Development adds another layer: AI adoption now improves throughput, but it still increases delivery instability. Coding speed is up, but the underlying system hasn't caught up. Activity metrics capture the speed gain and leave the instability invisible — until it shows up as incidents.

How do you measure AI maturity effectively?

A complete picture of AI maturity requires signals across the same dimensions that determine engineering effectiveness broadly, but applied specifically to an AI-augmented environment. The DORA AI Capabilities Model, published in the 2025 DORA report, identifies seven foundational practices that amplify AI impact:

a clear and communicated AI stance
working in small batches
AI-accessible internal data
quality internal platforms
healthy data ecosystems
user-centric focus
strong version control practices

Measuring whether those conditions are working — and diagnosing where they're breaking down — is what Uplevel’s WAVE Framework does. WAVE organizes measurement into four dimensions, each of which has both leading and lagging indicators. This makes it diagnostic rather than just descriptive. The sections below apply each dimension to the AI context specifically.

When it comes to actually gathering data to determine maturity, it’s important to have both system data and structured developer surveys and interviews as inputs. Especially as the entire concept of AI capabilities maturity evolves and changes, qualitative data on the dimensions that live in human judgment and organizational behavior should be measured alongside the quantitative signals. The two instruments cover the same framework from different angles.

What metrics reveal maturity in each dimension?

Ways of Working covers the cultural and behavioral factors that enable delivery, including ways of working with AI. There is a lot captured here, including which use cases show ROI, what SOPs govern tool usage, whether leadership has aligned on strategy, and whether teams have the psychological safety to experiment and learn.

Alignment connects effort to business goals and tests whether the organization is coordinated around them. Leading indicators here are planning stability, cross-team coherence on AI priorities, and whether engineering and leadership share a measurement framework. The ultimate goal of alignment is to optimize capacity spent on new value work vs. maintenance and incident work. AI can shift that ratio, but only if planning effectiveness and user feedback cycles are healthy enough to direct the recaptured capacity toward work that actually matters.

Velocity measures how efficiently work moves through your engineering system — throughput rates and the friction points that slow delivery. Uplevel's velocity score combines PR cycle time, PR velocity, issue velocity, and deployment frequency into a view of whether teams are consistently shipping completed work. In AI-augmented pipelines, one complication: as AI tool usage increases, PR complexity often doubles for high users, extending review cycles even as raw coding speed improves. Velocity metrics read without PR complexity data misread this as a review problem when the actual driver is batching and integration.

Environment Efficiency measures how well your engineering system supports productive work — including DORA metrics around recovery speed and code quality, and other indicators of structural friction like ownership clarity or context accuracy. AI coding tools surface problems here: bug rates climb when AI-generated code bypasses thorough review, recovery cycles lengthen when complexity increases faster than testing infrastructure scales, and flow efficiency stalls when code generation accelerates but deployment processes don't.

How do you assess AI capabilities maturity in practice?

A useful AI maturity assessment starts with a quantitative baseline: adoption patterns, delivery health, and quality signals. This establishes where the system is and, importantly, where gains are being absorbed before they reach outcomes. An org seeing strong AI adoption and rising bug rates has a specific diagnosis; an org seeing strong adoption and flat velocity has a different one. The baseline is what makes the diagnosis possible.

Qualitative signals surface the root causes: why the data looks the way it does. Developer interviews reveal bottlenecks that don't appear in system data — where engineers are correcting AI-assisted work, where context is missing or stale, where governance standards are applied inconsistently in practice. These are the bottlenecks likely to be invisible to leadership until they're expensive. Equally important: the engineers closest to the work should be part of developing solutions. When teams participate in interpreting findings and proposing interventions, the resulting changes are grounded in how the system actually operates — which is what makes them executable.

What the baseline unlocks is the ability to build deliberately. Engineering leaders can sequence investment by leverage — fixing the surfaces that constrain everything else first. Each stage compounds on the last: stronger technical foundations make context investment more effective; better context makes agentic systems more reliable; more reliable agentic systems create the conditions for full organizational capability and a functioning agentic SDLC. The assessment is where that compounding begins.

StackUp is Uplevel's free AI maturity assessment. It maps your organization's current state across the transformation surfaces, identifies where shallow adoption is capping your return, and shows where the foundational work has the most leverage. Engineering teams can run it self-serve; larger organizations receive a 30-minute consulting session to go deeper on findings.

Assess your AI maturity with StackUp →

FAQ

View full post