Measuring developer productivity is messy and hard primarily for two reasons:
-
Goodhart’s Law explains that once a metric becomes a target, it ceases to become a good measure. People operate by incentives, and if the incentive is to raise the metric, then the incentive will be to game the metric.
-
“All models are wrong, but some are useful,” explains statistician George Box. There is no one framework for productivity that can perfectly explain human behavior. There will always be variances, gaps, and complexity that a framework must ignore to provide the simplicity of a framework.
But despite the fact that there’s no consensus on how to measure developer productivity, there is now a whole market of AI coding assistant tools claiming they can improve it.
The verdict on that isn’t clear (as we’ll explain) but in a world where the majority of developers report using AI, avoiding it isn’t an option. The moral of the story? Proceed with caution. Companies have to shift from heedless adoption to careful experimentation if they want to have even a chance of reaping the productivity benefits that AI tools claim to offer.
The Results Are In: GenAI Isn't Improving Developer Productivity (Yet)
GitHub’s initial research found that Copilot had a positive impact on developer experience and self-reported productivity. According to Uplevel’s own quantitative study, using Copilot didn’t result in productivity improvements but did increase the rate of bugs produced. Our research contributes to a growing wealth of research showing benefits, drawbacks, and, above all, muddiness.
If there’s one takeaway from all the research done so far, it’s that there is no clear takeaway beyond knowing coding tools won’t be a panacea.
Little to no productivity gain
Our research showed little to no productivity gain from using GitHub CoPilot and significant potential downsides.
Analyzing actual engineering data from a sample of nearly 800 developers and objective metrics, such as cycle time, PR throughput, bug rate, and extended working hours (“Always On” time), we found that Copilot access provided no significant change in efficiency metrics.
And while other studies focus almost entirely on these efficiency metrics, ours also included a holistic perspective that revealed two major downsides:
-
The group using Copilot introduced 41% more bugs, suggesting copilot might affect code quality, given it didn’t affect PR throughput.
-
Copilot access didn’t mitigate the risk of burnout, as indicated by our “Sustained Always On” metric (which measures extended working time outside of standard hours). Developers without copilot access reduced this burnout risk at a rate of 28%, whereas those with Copilot access only reduced it by 17%.
As we’ve covered before, qualitative research (such as asking developers via survey if they feel more productive or focused with AI tools) suffers from a range of biases that make survey results suspect — and incomplete — without quantitative support. Developers might report initial happiness with AI tools, but happiness isn’t the whole story.
Long-term tradeoffs
The rise in bugs is especially alarming because developers can feel more productive from the novelty of these tools but end up contributing to a less effective engineering organization if code quality dips, tech debt builds up, or developer toil rises. What feels like short-term efficiency can actually become a long-term productivity drain.
There are a number of reasons why AI might impact bug rate:
-
A block of code the AI tool suggests could look reasonable enough to be accepted but hide a subtle error that shows up in production.
-
An AI tool might be able to suggest a logical addition to a given block of code but won’t be able to understand the end goal of its functionality, making the code harder to parse later on.
-
As developers get used to copilot tools, they can start making larger and larger changes per PR, meaning a reviewer is more likely to miss bugs.
-
Reviewers might use AI tools to help them review PRs, creating another opportunity to miss bugs.
All these issues can get worse depending on the language in use. AI tools tend to work better with well-documented and extensively used languages, like Python, which might give developers and reviewers false confidence when they use those tools in other languages.
Revealed preference shows hunger for AI coding tools
Many studies that purport to show the productivity benefits of AI coding tools use productivity metrics you might not agree with outside of these studies. If a developer refused to be measured by lines of code produced, for example, why would you trust a study that proves the power of an AI coding tool through the same metric?
We don’t expect or want our research to cut off or even slow down the adoption of AI coding assistants. GitHub research already shows that 97% of developers are using them.
That said, big numbers like these largely miss the forest for the trees. Yes, AI tool adoption is high among developers, but which tools are they adopting? For which use cases? How often are they using them? And are they really effective?
Adoption is high, but the details of the adoption are unclear. Some developers use generative AI tools to generate code; others use these tools to get on-demand explanations of blocks of code; others use these tools for bug detection and testing; and still others use them for writing documentation and release notes.
As of now, the technology is still so new (and evolving every day across an ever-growing market of options) that the industry doesn’t have a clear picture of the best use cases or best practices.
The Case-by-Case Search for Use Cases
AI might be transformative, but the technology will not sweep through your company and improve everything it touches. Instead, companies have to be strategic about how they roll out each AI function, work to identify the right use cases, and search for the tools that suit them.
Non-coding use cases could be even better for productivity
AI coding tools could change the way we build software in dramatic ways without the biggest way being a direct change to coding itself.
In a 2024 study by Microsoft, for example, 96% of developers predicted that AI would relieve them from the most tedious, most routine tasks, such as generating tests and documentation. The study says, “These tasks, while essential, are often seen as monotonous and distract from the more creative aspects of development.”
Similarly, the study also showed that 37% of developers hoped AI would be able to simplify administrative work, such as reading emails and managing tasks. “These duties, while not core to development,” the study says, “Consume substantial time and are ripe for AI’s organizational capabilities.”
Remember, developers are, by nature, experts in software engineering. If we consider productivity holistically, it shouldn’t be surprising that the best productivity gains might come from automating non-coding tasks so that developers can devote more time to using their specialized skills.
Your company’s particulars can change your use cases
One thing big studies about productivity miss – by nature – is how the particular context of your particular company can change the results.
A study from one company could show a productivity boost, but following in their footsteps might not produce the same results for you – even if you do everything the same way.
The variables are endless, but a company’s mix of junior and senior developers stands out as a special factor. According to Steve Morin, Head of Mobile Engineering at Asana, companies need to compare the value of an AI coding assistant helping, for example, a 15-year senior engineer vs. a new graduate just out of college. “It seems like getting my new grad to go faster and not bug their onboarding buddy with every little question is high leverage and easy,” Morin says.
Empowering Your Team with AI
Tips for leaders from Steve Morin, Head of Mobile Engineering @ Asana
Research shows that, in a similar vein, the willingness to even adopt AI coding tools can shift depending on the developer's tenure. In a 2024 study, researchers found that “Copilot significantly raises task completion for more recent hires and those in more junior positions but not for developers with longer tenure and in more senior positions.”
This result means any given company’s choice to adopt AI might change dramatically depending on its ratio of senior developers to junior developers. The productivity benefits you see – if any – will always be context-dependent.
AI's real benefits might not lie in productivity
The benefits of AI might one day be substantial, but they might not tie directly to sheer productivity.
AI tools could, for example, help developers shift priorities from routine work to creative work. Morin argues that, with AI, “Engineers will be able to not focus on some parts of the work and focus more time on other parts of the work. It will improve the technology but not by eliminating the person. It will improve the ability of people to do more sophisticated tasks."
But if the benefits are orthogonal, the risks might be, too. In our study, we saw a 41% increase in the rate of bugs produced, and we didn’t observe much benefit to “sustained always on,” an indicator of burnout.
In the end, the biggest risks might come from what we might call “opportunity cost.”
Many development teams already have difficulty communicating with each other, for example. At first glance, a coding assistant that automates this by providing a developer with an instant answer seems like a good solution. But if those quick answers bury the communication problem, then we could be hiding that problem while creating collaboration issues downstream.
AI often promises frictionless bliss, but in many cases, friction can be good. You want someone to review your code, for example, and in a sense, you might want to face communication problems because that might force you to solve them.
Embracing a Scientific Mindset
The results are clear because the results are, well, unclear: AI will not solve the developer productivity problem or provide a velocity boost without tradeoffs. Instead, companies need to embrace the hard work of adopting AI from a mindset of structured experimentation and rigorous testing. Company leaders need to be scientists, not pioneers.
Ask the right questions
The right mindset starts with the right questions. For many organizations, sheer development velocity is not the bottleneck, even in discussions around developer productivity. If you do value stream mapping, for example, you might discover many other hindrances on the way to an efficient engineering org.
Step back and ask: What are the biggest problems the engineering organization is actually facing? The answer to this might mean the introduction of AI tools – even if they do improve speed – will provide little help.
If, for example, your team doesn’t have strong practices around writing effective tickets and the tickets frequently include unclear requirements, then AI will only help you produce the wrong features faster. Building a solid foundation is a necessary precondition to introducing AI.
Set specific goals
Before even shopping for AI tools, companies should consider the outcomes they want to achieve. For every outcome, they need to think carefully about what it produces and how they’ll measure progress toward it.
Companies should approach goal-setting with rigor, but even then, goals will also require revision and experimentation. If you set the goal of increasing productivity, for example, and discover minor productivity improvements but major issues with bug rates, you’ll need to reset your goals or look for different ways to achieve them while minimizing tradeoffs.
Experiment, learn, and train
If and when you adopt an AI tool, that adoption can’t be the end of your work: Companies have to keep experimenting, keep learning, and keep training.
Instead of throwing AI tools at users and hoping for the best, find the best use cases and search for the prompts that yield the best results. As you iterate, share your findings across your company so that others can replicate or build on each success.
Along the way, you can build guardrails and guidelines so that experimentation comes with minimal risk. Morin recommends that companies build a Center of Excellence, arguing that companies should encourage a team of people who are especially passionate about AI to “go deep and show people where to leverage it.”
If you don’t, he warns that you can end up with a tool that does little more than eat up your budget. “If you're going to pay for a tool, especially if there’s a site-wide license for X hundreds of engineers, you want to train people to actually get the usage out of it. Otherwise, it might be sitting empty, and you'll just be paying for empty seats.”
A 2024 study echoes this warning. In the study, researchers ran three trials at Microsoft, Accenture, and an anonymous Fortune 100 electronics manufacturing company, but in the course of the study, many developers didn’t even try the AI tools.
“The adoption rate is significantly below 100% in all three experiments,” the researchers wrote. “With around 30-40% of the engineers not even trying the product. [...] Factors other than access, such as individual preferences and perceived utility of the tool, play important roles in engineers’ decisions to use this tool.”
If you want to learn whether AI tools boost productivity, then you need to be able to show your developers why and how AI tools can be effective.
Build and monitor engineering effectiveness metrics
Once you set goals, you need to have a baseline metric that you can test against as you start to experiment.
As of now, many AI tools don’t offer observability functions that will help you determine their impact or their tradeoffs. Start A/B testing on your own to gain objective, quantitative insight into whether AI is actually improving developer productivity and helping you reach your operational goals.
ANZ Bank, for example, trialed GitHub Copilot with a select team and measured the results via A/B testing before determining whether to roll it out to the rest of the company. Interestingly, ANZ Bank’s internal study found that Copilot was most beneficial for expert Python programmers, whereas other studies, such as this one from GitHub, showed that less experienced developers benefited more from Copilot.
The GitHub study isn’t necessarily wrong, but ANZ Bank found that GitHub’s results might not apply in the bank’s company-specific context — a framework every company adopting AI tools needs to have. Not many organizations have the resources or the native tooling to support this research, but that’s where an engineering dashboard like Uplevel can help.
Developer Productivity Remains an Unsolved Problem
AI – even if it were conclusively proven to help developers code faster – would not solve the developer productivity problem. As it is, most companies struggle to measure or even define developer productivity, so tools promising to accelerate productivity shouldn’t fill any company with confidence quite yet.
Baseline metrics – or the lack thereof – make this clear. Even though AI has the potential to help with productivity, few companies will be able to conclusively prove this and invest the right amount of resources in the right ways. Too many companies don’t yet have a developer productivity baseline from which they can measure impact.
Teams need to take a holistic approach to their organization to identify the biggest blockers to productivity — using both qualitative and quantitative data — and only then assess whether AI can target those specific problems. AI will not be a one-size-fits-all approach.
Don’t let the hype create heedlessness. Read the research, do your own research, and prioritize rigor over sheer innovation.
With special thanks to Matt Hoffman, product manager and data analyst at Uplevel