Uplevel Blog and Resource Center | Uplevel

Quantitative Measures of Developer Experience | Uplevel

Written by Nick Moore | Jul 5, 2024 2:54:15 PM

Dan North, the originator of Behaviour-Driven Development, once saved the worst programmer he knew from getting fired. A quantitative measure told management that Tim, the programmer in question, had delivered no measurable results week after week and had to go. 

North disagreed, writing, “Tim wasn’t delivering software; Tim was delivering a team that was delivering software.” Tim’s value extended beyond what the quantitative measures could capture, and North convinced the company to retain him. 

Stories like these have scared many modern companies away from quantitative approaches to productivity and experience. But the pendulum has swung to the other extreme: Now, too many companies focus almost entirely on qualitative measures—often through developer surveys—that produce results that are only superficially meaningful. 

We don’t need to repeat the past: With better research and better tools, we can provide holistic information that captures developer experience, analyzes engineering culture, and allows teams to use surveys less frequently and more impactfully. 

Developer Surveys Are Not A Complete Measure of Developer Experience

Erika Hall, the co-founder of Mule Design Studio, once wrote, “Surveys are the most dangerous research tool — misunderstood and misused. They frequently straddle the qualitative and quantitative, and at their worst represent the worst of both.” 

This idea is often surprising to many. What is a survey, after all, but a list of questions sent to numerous people at once? Well, the answer is nuanced, and the nuances make all the difference. A bad survey is worse than no survey at all.

Surveys are prone to biases

The more you look into the challenges of survey design, the more you realize how difficult it is to put together a survey that generates meaningful, useful results. It’s a huge challenge masquerading as an easy shortcut. Luckily for us, decades of sociologists and political researchers have already done the work of figuring out just how hard it is, and biases tend to top their list of challenges.

Hindsight bias

Like everyone, developers don’t have perfect memories. With hindsight bias, developers taking surveys might inaccurately recall their early predictions given the hindsight offered by the passage of time. This can look, for example, like developers insisting they knew a spike was going to take up more time than their project manager predicted, even though they would have said otherwise if asked at the time. 

Recency bias

Like everyone else, developers can have their memories warped by what happened most recently. With recency bias, developers might not remember, in detail, how they felt three weeks ago and assume meetings plagued them all month because they happened to have just gotten out of a long meeting when they got the survey.

Confirmation bias

Developers often feel beset by meetings, Slack pings, and slow build times. While all of these issues can be valid, confirmation bias can lead developers to overemphasize them or miss other, bigger issues that don’t confirm their already-held suspicions.

Survey questions are difficult to write

Put yourself in the shoes of a developer who just got a developer survey in their inbox. You click it open, and one of the first questions is seemingly simple: “From one to five, how busy did you feel in the past month?”

It’s a simple question that’s tough to answer. What does “busy” mean? How would you map individual senses of busyness to a five-point scale? What does a two really mean vs. a three? These questions can be difficult for survey-takers to answer and even more difficult for survey-givers to interpret. 

As Mule writes, “If you write bad survey questions, you get bad data at scale with no chance of recovery.” The more of this data you generate, the harder it is to get out from under it. 

Surveys impose manual, interruptive work on survey designers and takers

Many developer surveys try to determine how busy developers are and how often they’re able to focus and maintain a flow state. It’s ironic, then, that so many developer surveys become impediments to focus. 

Developer surveys aren’t hard, but they do require total attention, and each can take as long as fifteen minutes to complete. The more frequent these surveys are, the more severe the collective interruption and the more likely survey fatigue sets in, encouraging developers to skim the questions and answer them quickly.

Consider the impact at scale when an enterprise has a team of 1000 developers interrupted for 15 minutes every quarter:

One 15-minute survey is 250 hours (6.25 FTE weeks!) of lost developer time. Repeat that four times a year, and that’s a ton of time lost – as many as 25 developer weeks – just to find out how developers feel about their workload.

How We Quantify Engineering Culture

“Engineering culture” has become something of a buzzword or, at worst, a touchy-feely HR recruiting tactic. Without an engineering culture, however — one defined by alignment and performance, not in-office ping-pong — developer experience loses its meaning. 

With the end of ZIRP, investments in developer experience still matter, but they need to support an engineering culture that drives results. We break engineering culture down into three components, each of which is emergent and quantifiable: Alignment, technical performance, and team performance.

Alignment

As a technical leader, the most important work you can do involves determining how best to allocate the right resources to the right initiatives

Often, engineering leaders set out their top five most important priorities based on intuition alone. As a result, they might correctly align a given engineering effort with a business result but misunderstand how much time each effort actually needs. 

If you don’t know how time-consuming one project might be, a team dedicated to it that is too small can become a bottleneck. Or, similarly, you can prioritize net-new work and deprioritize infrastructure work without realizing the latter is the biggest drag on accomplishing the former. 

There are a range of questions that come up when you start thinking about alignment, including:

  • How much time are developers spending on net new high-value initiatives?
  • How much time are developers spending on addressing or managing tech debt?
  • How much time are developers spending just keeping the lights on?

Survey questions can’t answer these questions well. If tech debt is frustrating, confirmation bias might lead developers to think they spent more time on it than they do, and if developers expect to have a lot of KTLO work, confirmation bias might make the time sink less salient. 

Ultimately, only a quantitative approach can answer these questions and inform the decisions supporting alignment.

Technical performance

Here, questions of developer experience and developer productivity tend to mix. Organizations have a clear stake in productivity and performance, but developers also want to perform well. 

Technical performance, which breaks down at a high level into efficiency and quality, is a value developers and their engineers can agree on. 

Though there are many ways to measure productivity, DORA is the standard approach, and this methodology breaks performance into three components:

  • Velocity metrics, including deployment frequency and lead time for changes.
  • Stability metrics, including mean time to recovery and change failure rate.
  • Reliability metrics, including availability, latency, performance, and scalability.

At a glance, you can see how difficult — if not impossible — it would be to capture these metrics with developer surveys alone. DORA metrics also tend to be lagging indicators – useful, but more useful if you have context. With a quantitative approach, you can capture both the DORA metrics and the more actionable metrics that lead to them. 

Team performance

Developers work on teams, and teams are often more than the sum of their parts. Given that reality, which a good team should be aiming for, you have to ask questions like:

  • Are individual engineers consistently able to enter and remain in a flow state?

  • Can team members cross information silos and get help when they need it?

  • Can the team work at a sustainable pace while minimizing interruptions and maximizing collaboration?

Surveys can’t accurately capture this information because even the best survey writer lacks the full picture and because even the best survey takers will fall prey to the aforementioned biases. 

Take the challenge of enabling deep work despite interruptions, for example. Frequent, small interruptions can fade from memory, and rarer, bigger interruptions can stand out and feel disproportionately burdensome. Despite good intentions, intuition into team performance can only go so far. 

Accolade, in contrast, a telehealth company with over 300 engineers, quantified developer work and found that there was an "almost exact correlation" between deep work and context-switching metrics. 

By having visibility into these metrics, Accolade was able to increase deep work time by 20% and deployments to production by 205%.

Turning Chaos into Meaning

When you can fit your team in a room or walk down a single line of desks, you can just ask people questions and get a pretty good sense of what’s going on. But all too soon, companies grow too big to allow for those methods. 

Many companies don’t shift to quantitative measurements, however, because they’re wary of the overwhelming data this work can present. Here, you can’t skimp – if a developer appears unproductive only because they weren’t programming during an unseen Slack huddle, then all your measurements will be off. 

An overwhelming amount of information provides a familiar use case for machine learning (ML). With ML, for example, teams can surface useful insights from otherwise messy, chaotic Jira boards so that they can allocate the right projects to the right plans. Similarly, ML allows teams to pre-categorize types of Slack messages so that they can track which ones are true interruptions, allowing them to build a high-level view without ongoing manual work. 

By analyzing these issues from a holistic perspective, engineering leaders can better discern downstream and upstream issues, and with quantified results, they can diagnose issues that would otherwise be unintuitive. 

If developers are less productive one quarter from the next, for example, the source of the problem might be a non-obvious workflow problem. A good example is this 2018 study on task interruption in software development projects. The study showed that, even though 81% of participants predicted external interruptions to be the worst kind of interruption, self-interruptions were much worse for productivity. 

Without the right metrics, developers can rely on unseen biases, and engineering leaders can risk turning the wrong dials and increasing the risk of burnout. 

With the right metrics, engineering leaders can bring developers into the productivity and experience discussions, allowing everyone to have healthy, informed conversations through data.

Without Context, Developer Experience Is Just Noise

You want your developers to be happy, but happiness depends, in part, on the company’s ability to tie effort to results. The more engineering leaders can prove the value of their developers – and the more they can align their developers, amplifying the results of their work – the more developers can contribute to the business. 

Developer surveys are a necessary but insufficient input. With quantitative measurements, teams can sustainably measure experience in a granular, objective manner while complementing those results with qualitative results from rarer, more thorough surveys.

Developer experience is intrinsically valuable for team morale and well-being, but without the larger context of engineering effectiveness provided by quantitative measures, it can’t inform lasting decisions.