Back to Resources

DORA Metrics Are a Start. Here's What Comes Next.

  • 7 Minute Read

Table of Contents

    Written By Lauren Lang
    DORA Metrics - Featured Image

    The "DORA metrics" are the work of Google’s DevOps Research and Assessment (DORA) team, who gathered survey data from more than 31,000 developers over the course of five years. The outcome of that research was 2019’s State of DevOps report and accompanying book Accelerate, which outlined its findings and methodology. Both publications identified four key engineering metrics (with a fifth added in 2021) for measuring software delivery performance.

    They also identified a number of core DevOps capabilities that would allow optimal performance to occur, and the high-level organizational outcomes that the capabilities and metrics support. These capabilities and outcomes tend to fall by the wayside (which, as we'll explain, is important.)

    The “DORA metrics” (in green in the diagram below) have been widely adopted across large organizations and, with Google’s weight behind them, considered industry standard. In how DORA is considered and applied today in most cases, it is synonymous with these key metrics. 

    The problem? They only tell half the story.


    Nearly a decade after the research kicked off, forward-thinking companies, DevOps experts, and even DORA team members themselves are pulling back on the reins. It’s not that the five metrics are wrong — it’s more that they are a measure of efficiency more than overall performance. But efficiency is only one part of the DORA Core Model that looks at engineering effectiveness holistically and in context.

    That’s why, at Uplevel, we look at DORA metrics — but we look at them as part of a larger picture.

    DORA Engineering Metrics: What Are They and What Do They Measure?

    There are five DORA DevOps metrics that fall into three main categories outlined by the DORA team:

    • Throughput (velocity) metrics give leaders a picture of delivery during the development process and a way to measure velocity outside of arbitrary measurements like story points.

    • Stability (quality) metrics speak to the quality of the deployment processes.

    • Reliability is often overlooked (you'll still see many references to the "four" DORA metrics), but was added later as its own category and metric.

    Lead time for changes indicates the amount of time between code being committed and being deployed for production. Are automated tests and a healthy CI/CD pipeline reducing the time a pull request sits in queue? Is there a healthy code review process for what must be done manually?

    Deployment frequency quantifies how often are teams successfully releasing code to production. Best DevOps practice recommends that the total number of deployments should be high — deploying to production more frequently is a byproduct of streamlined processes and less complex PRs, which indicate healthy code reviews and more opportunity to catch errors. 


    Change failure rate captures the percentage of deployments that required remediation. Essentially, how much of the time did your code fail in production and require a hotfix, patch, or rollback?

    MTTR (mean time to repair) is the average time it takes to recover from a failure and time to restore service when an incident occurs. The objective is to restore as soon as possible, ideally before users are affected. When software can be repaired quickly, that’s a sign of healthy architecture, incident response systems, and a well-coordinated software development team.

    Reliability measures operational performance: your team's ability to keep promises about the product they're building, which might be captured in artifacts like SLAs and error budgets.


    As they stand, DORA metrics measure efficiency in terms of throughput, stability, and reliability. This means maximizing speed and output, determining how quickly and how often your teams can deploy functional code.

    Using a DORA-only approach, “high” performing teams deploy frequently, quickly, responsively, and accurately — indicators of success. But what if they’re focusing on the wrong priorities or burning out to meet production demands, jeopardizing future work? Are they still high-performing teams?

    What Software Efficiency Metrics Can’t Tell You

    While DORA metrics focus on team efficiency, they don't reflect effectiveness.

    At Uplevel, we define effectiveness as working on the right things in the right ways. This means aligning teams around the right priorities, giving them time to work on those priorities, and doing so sustainably.

    The DORA team acknowledges these variables. The addition of reliability addresses alignment with business objectives, though it's harder to measure than deployment frequency. The 2023 State of DevOps report emphasized well-being, showing that DORA metrics go beyond software delivery. This holistic view is also captured in the SPACE Framework, introduced by the original DORA team, though it lacks distinct metrics.

    Let’s come back to the DORA Core Model, depicted in its entirety above but now simply represented like this:


    There are a few things to notice here (beyond the fact that the metrics are actually just a part of the whole picture):

    • DORA metrics are lagging indicators. They will give you a picture of delivery performance, but they won’t explain any of the context behind it (that is, which capabilities across tech, process, or culture are strong or lacking in your organization). 

    • They don’t capture well-being, which is not only an outcome in itself but a co-predictor of organizational performance. 

    Why is this? Why don’t classic software engineering metrics include leading indicators to catch problems early? Given well-being's role in predicting organizational performance, why aren't people health metrics included?

    We believe DORA's focus on efficiency as a proxy for effectiveness reflects the limitations of data available to engineering teams.

    Outside of surveys, how can teams measure “well-being,” “burnout,” or “work recovery”? However, available data is rapidly evolving, as is our ability to surface insights leading to high-value outcomes.

    A New Comprehensive Measure of Engineering Effectiveness

    While the State of DevOps reports rely on self-reported survey data, many engineering intelligence solutions now use telemetry from CI/CD, project management, and version control platforms to quantify team performance against DORA metrics and track trends over time.

    No framework for understanding engineering performance is perfect. Capturing the nuance and complexity across humans, technologies, cultures, and processes when optimizing for effectiveness is incredibly difficult.

    However, advanced ML models and composite metrics now offer greater visibility into the larger picture outlined by the DORA team in the Core Model.

    At Uplevel, we've found that engineering leaders struggle with three often competing mandates, of which DORA metrics measure only one: delivering quality solutions, meeting commitments (alignment), and working sustainably (well-being). Succeeding at these challenges reflects not just efficiency but engineering effectiveness.

    Measuring efficiency is critical for our customers. Like any engineering intelligence platform, Uplevel provides visibility into the four DORA metrics to assess delivery throughput and stability.


    But if DORA is not the whole story, what other insights do we surface — and why?

    Allocation Insights

    A team working on quick, low-priority deployments may appear more effective than one working slowly on a complicated feature. That’s why aligning engineering efforts with business priorities is crucial.

    Uplevel's allocation insights reveal how much time your organization spends on new value demand work versus maintenance and support. By understanding how teams allocate their time, you can effectively distribute people, effort, and investments to impactful activities and better estimate delivery timelines.


    This visibility helps align engineering with other business functions, surfacing metrics that support DORA’s goals of creating a transparent value stream and promoting a common understanding of work through visual management.

    Challenge Accepted: Clean Project Management Data

    In a perfect world, all JIRA tickets would be perfectly tagged and all PRs would be correctly linked. Here on Earth, however, less-than-perfect hygiene is part of most engineering organizations’ reality. While many platforms can’t handle data that’s not perfectly formatted, we believe that the state of your project management shouldn’t get in the way of a good data story. 

    Uplevel’s advanced allocation logic compensates for the fact that JIRA gets messy. Different companies have different ways they categorize issues and label priority tags, and it can be difficult to track exactly how much time is spent on each task. Even if PRs are not directly linked to JIRA issues, Uplevel can still analyze signals like timestamps and comments to infer what tickets teams are working on when.


    With DORA metrics, you might see that your teams are deploying quickly and frequently and infer they have more capacity, but you would have no way to quantify it. 

    For example, if you’re measuring deployment frequency in weeks or months, how much of that time did your teams actually spend on delivery? If their days were packed with meetings and other interruptions, they may not have had the capacity to deploy tons of new code. In that case, low deployment frequency may not accurately depict team efficiency. They may have performed at a relatively high level given their lower capacity. 


    Instead, we measure how much time teams have for deep work, which we define as uninterrupted blocks of two or more hours. Deep work insights account not only for planned interruptions like meetings but also for chat distractions that break your teams’ concentration throughout the day.

    Meeting and Chat Interruption Insights

    Time spent in deep work is not easy to measure. A developer’s unscheduled time on a calendar doesn’t mean that they’re able to focus on hard problems for a minimum of two hours at a time. In reality, engineers’ days often involve a lot of context switching — from meeting to Slack conversations to a few minutes of coding time and back again. 

    Uplevel’s ML models help decipher what is actually interrupting deep work time and harming team productivity. As the only engineering intelligence platform that analyzes meeting and chat metadata (such as meeting titles and duration and chat timestamps and message character counts), Uplevel surfaces the frequency of these interruptions at the team level so that leaders can set goals and implement practices to maximize deep work time, prioritize well-being, and help their teams become more productive.

    Together, allocation and capacity insights are a measure of organizational focus: what your teams should be working on and how much time they have to work on it. Viewing your engineering efforts through this lens can give you a more accurate idea of effectiveness and overall performance, as well as the role leadership plays in it.

    Work Sustainability and Well-Being

    Developer burnout is a significant problem — it’s been addressed in each DORA report since 2019. Rightly so. It doesn’t take a team of researchers to figure out that overworked, unmotivated employees pose a risk to quality, productivity, retention, and organizational performance.

    Developer burnout is a significant issue, highlighted in every DORA report since 2019. Rightly so. It doesn't take a team of researchers to figure out that overworked, unmotivated employees threaten quality, productivity, retention, and performance.

    However, burnout measures aren't included in DORA metrics, likely due to the difficulty of quantifying well-being and linking it to objective performance. But waiting for performance measures to decline before recognizing burnout risks isn't helpful. True measures of engineering performance should account for burnout early, leading to root cause analysis and cultural improvements before DORA metrics drop.

    To quantify burnout risk at the team level, Uplevel uses composite metrics across systems. Sustained Always On combines low deep work time (indicative of interrupted work) and “Always On” behaviors (Jira, messaging, and calendar activity beyond normal hours). This context is crucial for understanding efficiency metrics and sustainability over time.

    Screen Shot 2024-05-07 at 11.01.40 AM

    AI-Powered Context for DORA Metrics

    A blend of DevOps efficiency and effectiveness metrics is required for enterprise companies to deliver value in a tough market. But when allocation, capacity, and burnout insights are missing from the equation, even though they are critical, it’s likely because they are difficult to capture and turn into insights.

    Thankfully, you no longer have to go without.

    Modern engineering leadership goes beyond homegrown BI platforms or quarterly dev surveys. Leveraging AI/ML technology, data science best practices, and people health capabilities, Uplevel provides you with the comprehensive team telemetry you need to turn engineering data — including DORA metrics — into actionable change.

    Schedule a demo today.