DORA Metrics - Featured Image

Google’s DevOps Research and Assessment (DORA) team gathered survey data from more than 31,000 developers over the course of five years to create 2019’s State of DevOps report and accompanying book Accelerate, which outlined its findings and methodology. Both publications identified four key metrics (with a fifth added in 2021) for measuring software delivery performance, a number of core DevOps capabilities that would allow optimal performance to occur, and the high-level organizational outcomes that the capabilities and metrics support.

The “DORA metrics” (in green in the diagram below) have been widely adopted across large organizations and, with Google’s weight behind them, considered industry standard. In how DORA is considered and applied today in most cases, it is synonymous with these key metrics. 

The problem? They only tell half the story.

dora-metrics-core-model

Nearly a decade after the research kicked off, forward-thinking companies, DevOps experts, and even DORA team members themselves are pulling back on the reins. It’s not that the five metrics are wrong — it’s more that they are a measure of efficiency more than overall performance. But efficiency is only one part of the DORA Core Model that looks at engineering effectiveness holistically and in context.

That’s why, at Uplevel, we look at DORA metrics — but we look at them as part of a larger picture.

DORA Engineering Metrics: What Are They and What Do They Measure?

There are five DORA DevOps metrics that fall into three main categories outlined by the DORA team:

  • Throughput (velocity) metrics give leaders a picture of delivery during the development process and a way to measure velocity outside of arbitrary measurements like story points.
  • Stability (quality) metrics speak to the quality of the deployment processes.
  • Reliability is often overlooked (you'll still see many references to the "four" DORA metrics), but was added later as its own category and metric.

Lead time for changes indicates the amount of time between code being committed and being deployed for production. Are automated tests and a healthy CI/CD pipeline reducing the time a pull request sits in queue? Is there a healthy code review process for what must be done manually?

Deployment frequency quantifies how often are teams successfully releasing code to production. Best DevOps practice recommends that the total number of deployments should be high — deploying to production more frequently is a byproduct of streamlined processes and less complex PRs, which indicate healthy code reviews and more opportunity to catch errors. 

 

Change failure rate captures the percentage of deployments that required remediation. Essentially, how much of the time did your code fail in production and require a hotfix, patch, or rollback?

MTTR (mean time to repair) is the average time it takes to recover from a failure and time to restore service when an incident occurs. The objective is to restore as soon as possible, ideally before users are affected. When software can be repaired quickly, that’s a sign of healthy architecture, incident response systems, and a well-coordinated software development team.

Reliability measures operational performance: your team's ability to keep promises about the product they're building, which might be captured in artifacts like SLAs and error budgets.

 

As they stand now, DORA metrics are a measure of efficiency (throughput, stability, and reliability). Efficiency often means maximizing speed and output. It answers the question: How quickly and often can your teams deploy new code that works and does what is promised? 

Using a DORA-only approach, your “high” performing teams would be those that deploy frequently, quickly, responsively, and accurately — all good indicators of success. But what if they’re working on the wrong priorities or burning themselves out to get code into production, putting future work at risk? Would you still consider those high-performing teams?

What Software Efficiency Metrics Can’t Tell You

While the DORA metrics are concerned with team efficiency, they don’t reflect effectiveness

At Uplevel, we define effectiveness as working on the right things in the right ways. That means aligning teams around the right priorities, giving them time to work on those priorities, and doing so in a culturally and productively sustainable way.

The DORA team doesn’t ignore the importance of these variables in their research. The addition of reliability does speak to the need for alignment with high-level business objectives (though it’s more difficult to measure than numerical metrics around deployment frequency, for example). Well-being was highlighted as a key focus point in the 2023 State of DevOps report, a reminder that DORA goes beyond just software delivery metrics. 

Let’s come back to the DORA Core Model, depicted in its entirety above but now simply represented like this:

dora-metrics-core-simplified

There are a few things to notice here (beyond the fact that the metrics are actually just a part of the whole picture):

  • DORA metrics are lagging indicators. They will give you a picture of delivery performance, but they won’t explain any of the context behind it (that is, which capabilities across tech, process, or culture are strong or lacking in your organization). 

  • DORA metrics don’t capture well-being, which is not only an outcome in itself but a co-predictor of organizational performance. 

Why is this? Why don’t the DORA metrics include leading indicators so that engineering leaders can catch problems before they begin? With well-being as a direct predictor of organizational performance (the end goal), why don’t people health metrics factor in?

We believe that DORA's focus on efficiency as a proxy for effectiveness is a reflection of the limitations in the data commonly available to engineering teams.

Outside of surveys, how would internal teams begin to measure or quantify “well-being” or “burnout” or “work recovery”? However, the data available is rapidly evolving — as is our ability to surface insights that lead to high-value outcomes.

A New Comprehensive Measure of Engineering Effectiveness

While the State of DevOps reports rely on self-reported survey data, many different engineering intelligence solutions today use telemetry across CI/CD, project management, and version control platforms to quantify how teams perform against each of the DORA metrics and provide high-level views of how trends are changing over time. 

No framework for understanding engineering performance will be perfect. It’s incredibly difficult to capture the nuance and complexity involved across humans, technologies, cultures, and processes when optimizing for effectiveness. 

But advanced ML models and composite metrics can now provide more visibility into the larger picture that the DORA team illustrates in the Core Model.

At Uplevel, we’ve found that engineering leaders struggle with three often competing mandates, of which the DORA metrics accurately measure one: delivering quality solutions, meeting commitments (alignment), and working sustainably (well-being). Succeeding at these challenges isn’t just a sign of efficiency but of engineering effectiveness. 

Measuring efficiency is critical to the success of our customers. Like any other engineering intelligence platform, Uplevel provides visibility into the four DORA metrics necessary to assess delivery throughput and stability. 

mttr

But if DORA is not the whole story, what other insights do we surface — and why?

Allocation Insights

A team working on quick, low-priority deployments all day may appear more effective than a team working slowly on a complicated feature build. That’s why it’s important to look at how closely engineering efforts align with business priorities. 

Uplevel's allocation insights show you how much of your organization’s time (down to the team level) is being spent on new value demand work vs. maintenance and support. With insights into how your teams are spending their time — separated into individual investment buckets — you can allocate people, effort, and investments to activities that would make a greater impact on the organization and have a better sense of how long it will take to deliver on goals. 

allocation-insights

This visibility helps to create alignment between engineering and other business functions, surfacing metrics that relate directly to DORA’s capabilities around creating a transparent value stream and implementing visual management of work to promote common understanding of that value stream.

Challenge Accepted: Clean Project Management Data

In a perfect world, all JIRA tickets would be perfectly tagged and all PRs would be correctly linked. Here on Earth, however, less-than-perfect hygiene is part of most engineering organizations’ reality. While many platforms can’t handle data that’s not perfectly formatted, we believe that the state of your project management shouldn’t get in the way of a good data story. 

Uplevel’s advanced allocation logic compensates for the fact that JIRA gets messy. Different companies have different ways they categorize issues and label priority tags, and it can be difficult to track exactly how much time is spent on each task. Even if PRs are not directly linked to JIRA issues, Uplevel can still analyze signals like timestamps and comments to infer what tickets teams are working on when.

Capacity

With DORA metrics, you might see that your teams are deploying quickly and frequently and infer they have more capacity, but you would have no way to quantify it. 

For example, if you’re measuring deployment frequency in weeks or months, how much of that time did your teams actually spend on delivery? If their days were packed with meetings and other interruptions, they may not have had the capacity to deploy tons of new code. In that case, low deployment frequency may not accurately depict team efficiency. They may have performed at a relatively high level given their lower capacity. 

dora-metrics-capacity

Instead, we measure how much time teams have for deep work, which we define as uninterrupted blocks of two or more hours. Deep work insights account not only for planned interruptions like meetings but also for chat distractions that break your teams’ concentration throughout the day.

Meeting and Chat Interruption Insights

Time spent in deep work is not easy to measure. A developer’s unscheduled time on a calendar doesn’t mean that they’re able to focus on hard problems for a minimum of two hours at a time. In reality, engineers’ days often involve a lot of context switching — from meeting to Slack conversations to a few minutes of coding time and back again. 

Uplevel’s ML models help decipher what is actually interrupting deep work time and harming team productivity. As the only engineering intelligence platform that analyzes meeting and chat metadata (such as meeting titles and duration and chat timestamps and message character counts), Uplevel surfaces the frequency of these interruptions at the team level so that leaders can set goals and implement practices to maximize deep work time, prioritize well-being, and help their teams become more productive.

Together, allocation and capacity insights are a measure of organizational focus: what your teams should be working on and how much time they have to work on it. Viewing your engineering efforts through this lens can give you a more accurate idea of effectiveness and overall performance, as well as the role leadership plays in it.

Work Sustainability and Well-Being

Developer burnout is a significant problem, enough so that it’s been addressed in each DORA report since 2019. And rightly so — it doesn’t take a team of researchers to figure out that overworked, unmotivated employees pose a risk to quality, productivity, retention, and organizational performance.

Yet measures of burnout are not included in the DORA metrics — again, likely because it’s difficult to measure well-being quantifiably and then tie it to more objective performance measures. But as those performance measures are a lagging indicator, it’s also not helpful to wait until they start to decline to realize that there might be a problem within your engineering culture. Any true measure of engineering performance should account for the negative impacts of burnout as soon as possible. This can then lead to investigation into root causes and cultural improvements that can be implemented before DORA metrics begin to drop.

To quantify burnout risk at the team level, Uplevel uses composite metrics across systems. Burnout risk is a combination of low deep work time (typically associated with high numbers of “short fragments” that indicate interrupted work) and “Always On” behaviors that show messaging, JIRA, and calendar activity beyond normal working hours. This is perhaps the most important context you can give to your efficiency metrics, as it actually quantifies your teams’ burnout risk to indicate sustainability over time. 

DORA Metrics - Burnout Risk

AI-Powered Context for DORA Metrics

A blend of DevOps efficiency and effectiveness metrics is required for enterprise companies to deliver value in a tough market. But when allocation, capacity, and burnout insights are missing from the equation, even though they are critical, it’s likely because they are difficult to capture and turn into insights.

Thankfully, you no longer have to go without.

Modern engineering leadership goes beyond homegrown BI platforms or quarterly dev surveys. Leveraging AI/ML technology, data science best practices, and people health capabilities, Uplevel provides you with the comprehensive team telemetry you need to turn engineering data — including DORA metrics — into actionable change.

Schedule a demo today.

 

Stay up to date with Uplevel
news and resources