AI Capabilities Maturity: What to Measure Beyond Adoption

Uplevel surveyed over 100 engineering leaders and found that 88% rate their organizations as highly prepared for AI. In the same survey, only 2% had a documented AI strategy.

The fundamental mistake shown here — in this data, but also in the countless organizations replacing devs with Claude — is mistaking confidence for capability. Checking the box on tool rollout doesn’t mean you’ve generated ROI. License utilization, code acceptance rates, and AI-assisted commit volume measure activity. They say almost nothing about whether that activity is generating business value, or what it’s costing you to find out.

That work requires mature AI capabilities.

What is AI capabilities maturity?

AI capabilities maturity is a measure of how well an organization's full system — technical infrastructure, team design, governance, shared context, and people — is positioned to generate compounding returns from AI investment, at the team and org level.

Maturity is dynamic, not a one-time checkpoint. An organization can score high on adoption and low on system health simultaneously — and that combination carries substantial cost, because spend is accumulating while the return is unclear. The six transformation surfaces that determine whether AI compounds or stalls map directly to what a mature measurement program has to cover.

Agentic transformation surfaces: context ecosystem, technical foundation, agentic systems, engineer skills and capabilities, team design and org structure, and governance and quality

Why usage metrics give you a false read

Usage going up means engineers are using the tools. Whether the tools are working for the organization is a separate question, and it requires different measurement.

Uplevel's AI Measurement Crisis research found that 50% of engineering leaders want business outcomes from AI, but only 3% use business metrics to evaluate engineering performance. The measurement approach doesn't match the stated goal. Leaders are tracking the input — adoption — and hoping the output takes care of itself.

The cost dimension makes this concrete. AI tool spend scales with usage. Organizations running agents at scale are discovering that token costs accumulate fast — some have been advised to roll back agent deployments because the spend was outpacing any measurable return.

When the primary metric is adoption volume, there's nothing in the measurement system to catch runaway token spend before the CFO does.

DORA’s 2025 State of AI-Assisted Software Development adds another layer: AI adoption now improves throughput, but it still increases delivery instability. Coding speed is up, but the underlying system hasn't caught up. Activity metrics capture the speed gain and leave the instability invisible — until it shows up as incidents.

How do you measure AI maturity effectively?

A complete picture of AI maturity requires signals across the same dimensions that determine engineering effectiveness broadly, but applied specifically to an AI-augmented environment. The DORA AI Capabilities Model, published in the 2025 DORA report, identifies seven foundational practices that amplify AI impact:

a clear and communicated AI stance
working in small batches
AI-accessible internal data
quality internal platforms
healthy data ecosystems
user-centric focus
strong version control practices

Measuring whether those conditions are working — and diagnosing where they're breaking down — is what Uplevel’s WAVE Framework does. WAVE organizes measurement into four dimensions, each of which has both leading and lagging indicators. This makes it diagnostic rather than just descriptive. The sections below apply each dimension to the AI context specifically.

When it comes to actually gathering data to determine maturity, it’s important to have both system data and structured developer surveys and interviews as inputs. Especially as the entire concept of AI capabilities maturity evolves and changes, qualitative data on the dimensions that live in human judgment and organizational behavior should be measured alongside the quantitative signals. The two instruments cover the same framework from different angles.

What metrics reveal maturity in each dimension?

Ways of Working covers the cultural and behavioral factors that enable delivery, including ways of working with AI. There is a lot captured here, including which use cases show ROI, what SOPs govern tool usage, whether leadership has aligned on strategy, and whether teams have the psychological safety to experiment and learn.

qualitative-sampling-product-image-psychological-safety-and-team-health@2x

Alignment connects effort to business goals and tests whether the organization is coordinated around them. Leading indicators here are planning stability, cross-team coherence on AI priorities, and whether engineering and leadership share a measurement framework. The ultimate goal of alignment is to optimize capacity spent on new value work vs. maintenance and incident work. AI can shift that ratio, but only if planning effectiveness and user feedback cycles are healthy enough to direct the recaptured capacity toward work that actually matters.

allocation-analysis-hero-image@2x

Velocity measures how efficiently work moves through your engineering system — throughput rates and the friction points that slow delivery. Uplevel's velocity score combines PR cycle time, PR velocity, issue velocity, and deployment frequency into a view of whether teams are consistently shipping completed work. In AI-augmented pipelines, one complication: as AI tool usage increases, PR complexity often doubles for high users, extending review cycles even as raw coding speed improves. Velocity metrics read without PR complexity data misread this as a review problem when the actual driver is batching and integration.

flow-state-product-image-deployment-capability@2x-1

Environment Efficiency measures how well your engineering system supports productive work — including DORA metrics around recovery speed and code quality, and other indicators of structural friction like ownership clarity or context accuracy. AI coding tools surface problems here: bug rates climb when AI-generated code bypasses thorough review, recovery cycles lengthen when complexity increases faster than testing infrastructure scales, and flow efficiency stalls when code generation accelerates but deployment processes don't.

flow-state-product-image-DORA-metrics@2x

How do you assess AI capabilities maturity in practice?

A useful AI maturity assessment starts with a quantitative baseline: adoption patterns, delivery health, and quality signals. This establishes where the system is and, importantly, where gains are being absorbed before they reach outcomes. An org seeing strong AI adoption and rising bug rates has a specific diagnosis; an org seeing strong adoption and flat velocity has a different one. The baseline is what makes the diagnosis possible.

Qualitative signals surface the root causes: why the data looks the way it does. Developer interviews reveal bottlenecks that don't appear in system data — where engineers are correcting AI-assisted work, where context is missing or stale, where governance standards are applied inconsistently in practice. These are the bottlenecks likely to be invisible to leadership until they're expensive. Equally important: the engineers closest to the work should be part of developing solutions. When teams participate in interpreting findings and proposing interventions, the resulting changes are grounded in how the system actually operates — which is what makes them executable.

dev-ex-discovery-hero-image@2x

Organizations that skip any part of this tend to deploy solutions into problems they haven't accurately diagnosed, which is how you get a second expensive pilot on top of a struggling first one.

What the baseline unlocks is the ability to build deliberately. Engineering leaders can sequence investment by leverage — fixing the surfaces that constrain everything else first. Each stage compounds on the last: stronger technical foundations make context investment more effective; better context makes agentic systems more reliable; more reliable agentic systems create the conditions for full organizational capability and a functioning agentic SDLC. The assessment is where that compounding begins.

StackUp is Uplevel's free AI maturity assessment. It maps your organization's current state across the transformation surfaces, identifies where shallow adoption is capping your return, and shows where the foundational work has the most leverage. Engineering teams can run it self-serve; larger organizations receive a 30-minute consulting session to go deeper on findings.

Assess your AI maturity with StackUp →

FAQ

What metrics should you use to assess AI capabilities maturity?

The metrics that matter span four dimensions: how teams are working with AI day-to-day (Ways of Working), whether AI effort connects to business goals (Alignment), how delivery throughput and stability are holding up (Velocity), and the quality of the environment AI and engineers are working within (Environment Efficiency). Adoption metrics — license usage, code acceptance rates — cover a slice of Ways of Working. A complete picture requires leading and lagging indicators across all four.

What is an AI maturity model for software engineering?

An AI maturity model is a framework for assessing how well an engineering organization's full system is positioned to generate compounding returns from AI investment. A useful model covers technical infrastructure, team design, governance, shared context, and business outcomes — and tracks both leading indicators (inputs that drive performance) and lagging indicators (results). Maturity is dynamic: it changes as AI tooling evolves, as organizational conditions shift, and as the gaps between adoption and impact widen or close.

What's the difference between AI adoption and maturity?

AI adoption measures whether engineers are using AI tools. AI maturity measures whether the organization's system is set up for that usage to generate business value. High adoption with low maturity is the pattern most likely to produce substantial cost accumulation — token spend, rework, instability — without corresponding return. The two metrics can move in opposite directions.

How do you measure AI capabilities in an engineering organization?

Measurement requires quantitative signals from engineering systems — adoption patterns, delivery health, quality indicators — combined with qualitative signals from the engineers doing the work. Quantitative data shows what is happening. Qualitative interviews show why. Root causes that live in developer judgment, context quality, and governance practices require qualitative instruments to surface.

What does a good AI maturity assessment include?

A baseline across four dimensions: Ways of Working, Alignment, Velocity, and Environment Efficiency. Each dimension has both leading and lagging indicators. The quantitative baseline establishes where the system is; structured developer interviews add the root causes. The output should sequence improvement by leverage — identifying which surfaces have the most impact on the others before identifying which to address first.

How often should you reassess AI capabilities maturity?

AI tooling and organizational conditions both change fast enough that a once-a-year snapshot is likely to be stale by the time you act on it. Leading indicators — adoption patterns, PR cycle time, bug rates — are worth tracking continuously. A full reassessment across all four WAVE dimensions every quarter gives enough signal to course-correct before problems compound. An unscheduled reassessment is warranted after any significant tooling change, team restructure, or meaningful shift in delivery stability.

How do you know if your AI strategy is working?

The most reliable signal is business outcomes: cycle time from ideation to delivery, capacity recovered from unplanned work, delivery stability under AI-accelerated conditions. These are the metrics most leaders say they want and almost none track systematically for AI specifically. The leading indicators — planning stability, PR cycle time, bug rates, developer confidence in AI outputs — are the early-warning system that tells you whether those outcomes are building or eroding before you see it in revenue.

AI Capabilities Maturity: What to Measure Beyond Adoption

What is AI capabilities maturity?

Why usage metrics give you a false read

Hurry Up and 10x: The Path to Real AI Productivity

How do you measure AI maturity effectively?

What metrics reveal maturity in each dimension?

How do you assess AI capabilities maturity in practice?

FAQ

What metrics should you use to assess AI capabilities maturity?

What is an AI maturity model for software engineering?

What's the difference between AI adoption and maturity?

How do you measure AI capabilities in an engineering organization?

What does a good AI maturity assessment include?

How often should you reassess AI capabilities maturity?

How do you know if your AI strategy is working?

Lauren Lang

Skip the demo. Get real answers on how to maximize AI impact.

More Resources on AI Transformation

How to Choose an Engineering Transformation Consultant

Tokenomics for the Deeply Skeptical

Product

Resources