Back to Resources

Story Point Estimation Doesn’t Work

  • 6 Minute Read

Table of Contents

    Written By Nick Moore

    “I like to say that I may have invented story points, and if I did, I’m sorry now.” So writes Ron Jeffries, one of the original signatories of the Agile Manifesto and the person behind story points as we know them today.

    Story points, from one perspective, seem canonical to modern software development. Like “Google,” the industry has turned it into a verb (e.g., “Can you point that?”), and if you walked into a tech company’s office today, it wouldn’t be surprising to see a small ring of developers holding up their hands – one finger here, three there, five there – on a Monday morning.

    But from another perspective, ranging from Jeffries’s regretful one to competing stances from the FAANGs, the era of story points is over. 

    Story points are a way to estimate the relative size of specific tasks the development team — often led by a project manager or Scrum master — is currently planning. At first glance, that’s all they are, but if you look at how story points became so popular, you can see paths not chosen, definitions changed, and tradeoffs accepted or refused. 

    Fast forward to today, and it’s unclear whether story points are really the best way to estimate developer time at all. And if you look closer, the better ways involve much bigger shifts than holding up different numbers of fingers. 

    Time Estimation: A Brief History

    Modern time estimation, as we know it today, starts with Frederick Taylor's scientific management approach in the early 1900s, which emphasized strict control over worker processes. As Martin Fowler, another one of the original signatories to the Agile Manifesto, writes, “You didn't want [workers] to decide how they should make a particular piece of machinery, but somebody else, somebody who was more intelligent and educated.”

    In the 1980s and 1990s, the software industry followed a Tayloristic model and asked experts to figure out processes companies could slot software developers into. But they soon realized that this model wasn't suitable for the creative and unpredictable nature of coding.

    In theory, you can dictate the workflow of a worker on the factory floor because the work is the same. In software development, little is routine.

    cone of uncertainty

    (Source)

    The “cone of uncertainty” above illustrates the difficulty in predicting software development timelines. As a project progresses, estimates become more accurate, but initial predictions are highly variable. This realization led to the introduction of story points, an abstract estimation method that focuses on task size rather than duration.

    Story points are a way of mitigating the cone of uncertainty.

    Story points are estimated in abstract terms to reinforce that idea. Proxies include a poker deck (with cards representing task estimates), t-shirt sizes (with tasks categorized into sizes), and the Fibonacci sequence (with 1, 2, 3, 5, 8, etc., used to estimate tasks along a non-linear scale). 

    You can think of these proxies as solving the same kind of problem you might face when estimating the size of a dog. You’d likely struggle to guess a random dog’s weight, but you could reliably say it’s as small as a Chihuahua or as big as a Saint Bernard. 

    The advantage here is that if a team determines a task is relatively huge, they can break the problem down further into relatively smaller tasks. 

    story-points

    The dot-com boom and subsequent crash created a divergence in estimation practices. 

    In the 1990s, numerous Internet companies started and bloomed, and most didn’t use Agile because it simply hadn’t been invented yet. But, as Apple developer Adam Ruka  writes, “After the dot-com bubble burst in the early 2000s, basically all successful software startups founded in later years – companies like Facebook, Uber, Twitter, Netflix, Stripe, AirBnB, and many others – followed the playbook of the Internet boom’s early giants, and that meant foregoing Agile for the most part.”

    This is how we ended up in a world where story points can seem both canonical and outdated. Many traditional companies, which fused technology departments onto pre-existing businesses, embraced Agile, Scrum, and story points, but many “pure” technology companies used other methodologies. 

    Why Story Points Aren’t Effective for Enterprise Time Allocation

    Decades of evidence have now shown us that story points are more than likely a legacy concept. Given the history, we can see the good reasons why companies use them, but we can also see new strategies that solve old problems better. 

    How story points can fail

    Story points are extremely susceptible to failure, and that possibility is itself a flaw: The less you can trust a story point, the worse your planning around them will be. More specifically, story points have a number of drawbacks: 

    • Variability across teams. Different teams within the same company often have different systems for assigning points. Variance is even bigger across companies, so a newly hired developer, nominally experienced in Scrum, might still need to relearn a new system.

    • Susceptibility to bias. Personal and team biases can affect story point estimation—sometimes dramatically. If a developer has dealt with a similar bug in the past, for example, that experience can easily cause them to overestimate or underestimate the work involved in the next bug. 

    • Inability to capture unplanned work. Pointing is supposed to be abstract enough to free developers from specific timelines, but the rigidity of story points often means it’s still a challenge to account for work that pops up during a sprint. 

    At their best, story points are meant to estimate task size and effort. In small, localized contexts – in small teams or across a couple of small teams – they can work pretty well. But beyond this use case, story points tend to break down. 

    Despite being proxies, the numbers used are frequently too tempting for managers and executives to ignore. As Jeffries writes, “I think given two teams producing things, it’s an irresistible temptation for many managers to compare them. I think it’s irresistible enough that I’d drop the notion of story points and even the notion of estimating stories at all, where possible.”

    Story points were never meant to measure global productivity or velocity, and their context-specific nature means they fall apart when used to do anything other than help a specific team think about their upcoming tasks. 

    The fall of Skype

    Companies frequently fail because they don't iterate fast enough, but that failure will look like an overall product failure, not a result of process failure. As a result, story points are almost non-falsifiable. 

    That said, you can still gather evidence. If developer morale is low, a productivity drop will follow, and if pointing (alongside a heap of other Agile rituals) lowers morale, then we can predict issues. Similarly, if a company that uses story points falls behind a company that doesn’t, we can make some inferences. 

    Gergely Orosz, the writer behind The Pragmatic Engineer newsletter and a former developer at Uber and Skype, provides an example. When Orosz joined Skype in 2012, the company was fully committed to Scrum (and story points, as a result). 

    Initially, this was a success. “We went from shipping the flagship Windows app once-a-quarter at best to monthly shipping,” Orosz writes. 

    Around the same time, however, Whatsapp, founded in 2009, started to outpace Skype. “Though a much smaller organization,” Orosz writes, “Whatsapp chipped away market share month after month, becoming the leading communications platform.” Whatsapp didn’t bother with Scrum. According to Orosz, Whatsapp “deliberately ignored all heavyweight processes.” 

    Eventually, despite Skype’s early lead, Whatsapp won. 

    Of course, it’s impossible to attribute failures like these to any one cause. If Whatsapp were the lone tech company that succeeded despite story points, we could write it off as an exception to the rule — but the opposite is true. 

    What FAANG Companies Do Instead

    When Agile first rose to popularity, many developers greeted it as a savior because they hated Waterfall development – an even older methodology that asked developers to build everything before showing end-users and stakeholders even a bit of code. 

    As a result, when people hear about companies not using Agile, they tend to assume they’re still using Waterfall. However, non-waterfall methodologies preceded Agile, and companies have used them to great success throughout Agile’s rise and fall. 

    In fact, when you look at what major companies use today, you’ll hardly find any mention of Scrum or story points. In many companies, there isn’t a central methodology at all, and when teams have a process, it tends to be much more lightweight than Scrum. 

    The more you look, the more counterexamples you’ll find. Amazon often uses a system called Working Backwards, and Basecamp has published a thorough guide to its own system, Shape Up

    Better Approaches for Estimating Developer Time

    There are many different ways to estimate developer time, and those ways likely vary among industries, companies, and teams. If you’re looking to do something other than story points, however, you can broadly break the categories available to you into two: more qualitative or more quantitative

    Methodologies that are more qualitative than story points often don’t ask developers to make time estimates at all or ask them to make estimates themselves without reference to any specific system or proxy. 

    Methodologies that are more quantitative than story points use modern platforms to build data-driven metrics that development teams can use to provide accurate estimates. Teams needn’t know the Fibonacci sequence or worry that bias is going to distort their estimates.

    But the choice isn't mutually exclusive.

    With an engineering intelligence platform like Uplevel, you can account for how developers actually spend their time (including meetings, out-of-office time, deep work, and interruptions) and help developers improve their estimates with information they wouldn’t be able to intuit on their own.  

    Accolade, for example, used Uplevel to achieve a 20% increase in deep work time and a 205% increase in deployments. With the right platform and methodology, development teams can combine the more qualitative effects of inputs like deep work with the more quantitative effects of outputs like deployments, allowing them to go far beyond pointing. 

    Need a better way to allocate developer time?

    Learn how Uplevel makes planning and estimation easier for enterprise orgs.

    Developer-Time-Allocation (1)