Why Estimation Is Broken (And What Actually Works)

Twenty years of software estimation taught me that hours estimates are fiction. Here's what actually works for planning.

Core: Your company probably asks engineers to estimate how long tasks will take. Your engineers probably give estimates that are wrong. This isn’t the engineers’ fault—estimation is fundamentally unreliable for novel work.

Why Hour Estimates Are Fiction

Detail: In 2010, we were asked to estimate a feature: “Add payment retry logic.” Seemed straightforward. Attempt failed payment twice more. Senior engineer estimated: 8 hours.

The estimate was wrong. Not because they were bad at estimating, but because estimation is fiction for novel work. Here’s what we encountered:

  • Payment provider API documentation was inconsistent with actual behavior (lost 1 hour reading docs and testing)
  • Retry logic needed to interact with billing system (unexpected dependency, 3 hours)
  • Tests discovered edge case in existing payment flow (1 hour fixing)
  • Deployment revealed race condition in retry logic (1 hour fixing)
  • Monitoring showed retry logic was logging too verbosely (created log storage issues, 2 hours tuning)

The 8-hour estimate turned into 18 hours. Nobody was incompetent; the work had unknowns that appeared during execution.

Here’s the thing: any expert looking at the task would probably also estimate 8 hours. The unknowns weren’t obvious before diving in. They became obvious during work.

Application: Hour estimates are fiction for novel work. If the work is routine (you’ve done it 50 times), estimates are reasonable. If the work is novel, estimates are guesses. Stop pretending they’re reliable.

The Estimation Bias: Planning Fallacy

Core: Humans are bad at estimating because we’re optimistic by default. We imagine the happy path.

Detail: When asked to estimate, your brain imagines: “I’ll code feature X, tests pass, deploy succeeds, done.” Your brain doesn’t imagine: “I’ll spend 3 hours on a bug in dependency Y that appears at 2 AM.”

This is planning fallacy. We estimate the time for happy-path execution, then act surprised when things go wrong. Things always go wrong. The difference between “estimate” and “actual” is usually the cumulative time spent on surprises.

Research shows planning fallacy is pervasive across domains. Students estimate study time for exams, then take longer. Companies estimate software project timelines, then miss them. This isn’t stupidity—it’s how human brains work.

The consequence: schedules are based on optimistic fiction. Every organization then adds a “buffer” (multiply estimates by 1.5x or 2x) hoping to account for unknowns. This is mathematically incoherent but pragmatically reasonable.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# Example: Why estimation fails (Biased optimism)

class EstimationBias:
    def estimate_task(self, task: str) -> int:
        """
        Human estimator considers happy path.
        Unknowns don't influence estimate because they're unknown.
        """
        if task == "add payment retry":
            # Optimistic estimate: 8 hours
            # Imagines: code → test → deploy ✓
            # Doesn't imagine: unexpected API inconsistencies, dependencies, edge cases
            return 8
    
    def actual_time(self, task: str) -> int:
        """Reality: happy path + unknowns"""
        if task == "add payment retry":
            happy_path = 6  # happy path was ~6 hours
            unknowns = 12   # unknowns added 12 hours
            return happy_path + unknowns
    
    def estimate_accuracy(self, task: str) -> float:
        estimated = self.estimate_task(task)
        actual = self.actual_time(task)
        return (estimated - actual) / actual  # -0.56 = 56% underestimate

# The bias is CONSISTENT across estimates
# Estimates are almost never right, they're usually 30-50% underestimates

# The problem: You can't just multiply by 1.5x
# Because the multiplier is different per task based on unknowns

estimates = [8, 5, 3, 12, 20]  # hours
actuals = [18, 7, 4, 15, 22]   # hours
multipliers = [a/e for a, e in zip(actuals, estimates)]
# [2.25, 1.4, 1.33, 1.25, 1.1]
# Not consistent! Some tasks are 2x over, some are 1.1x.
# You can't use a fixed multiplier.

# The real lesson: estimation error increases with task novelty
def estimate_with_confidence(task: str, novelty_level: str) -> tuple[int, int]:
    """Return (estimate, confidence_range)"""
    base_estimate = 8
    
    if novelty_level == "routine":  # Done 50+ times
        return base_estimate, (base_estimate * 0.8, base_estimate * 1.2)
    elif novelty_level == "familiar":  # Done 5-10 times
        return base_estimate, (base_estimate * 0.7, base_estimate * 1.5)
    elif novelty_level == "novel":  # First time
        return base_estimate, (base_estimate * 0.5, base_estimate * 2.5)
    else:  # Research/unknown
        return base_estimate, (base_estimate * 0.2, base_estimate * 5.0)

Application: If you must estimate, provide confidence ranges, not single numbers. “8 hours ±4 hours” is more honest than “8 hours.” Recognize that confidence shrinks with novelty.

What Actually Works: Story Points

Core: Instead of hour estimates, use story points—relative sizing without claiming accuracy.

Detail: Instead of asking “how many hours,” ask “compared to that other task we did, how much work is this?” This removes the illusion of accuracy while providing useful comparison.

If task A takes 8 hours and task B is similar but slightly more complex, B gets “5 points” to A’s “3 points.” You’re not claiming accuracy; you’re estimating relative complexity.

The magic: story points work because they’re calibrated by comparison, not by absolute hours. “5 points means 20 hours” is meaningless, but “5 points is bigger than 3 points” is useful for planning.

Teams can track velocity (points per sprint) and plan future sprints. If you consistently deliver 40 points per sprint, you can plan 6 sprints of work knowing it’s 240 points. Individual task accuracy doesn’t matter if your team’s aggregate accuracy is calibrated.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
# Example: Story points vs hours

class PlanningWithStoryPoints:
    def __init__(self):
        self.velocity_per_sprint = 40  # Team average: 40 points per sprint
        self.sprint_weeks = 2
    
    def estimate_roadmap(self, tasks: list[tuple[str, int]]) -> dict:
        """
        Tasks: list of (task_name, story_points)
        Returns: estimated timeline
        """
        total_points = sum(points for _, points in tasks)
        sprints_needed = total_points / self.velocity_per_sprint
        weeks = sprints_needed * self.sprint_weeks
        
        return {
            "total_points": total_points,
            "sprints": sprints_needed,
            "weeks": weeks,
            "note": "Assumes team velocity remains stable"
        }

# Example roadmap
roadmap = [
    ("payment retry logic", 5),
    ("user dashboard redesign", 8),
    ("API rate limiting", 3),
    ("notification system", 13),
    ("search optimization", 8),
]

planner = PlanningWithStoryPoints()
timeline = planner.estimate_roadmap(roadmap)
# {
#     'total_points': 37,
#     'sprints': 0.925,
#     'weeks': 1.85,
#     'note': 'Assumes team velocity remains stable'
# }

# Now team knows: ~2 sprints of work
# If velocity drops (new team members) or spikes (focused effort), velocity adjusts
# The roadmap recalibrates automatically

# Key: Story points never claim "this is 18 hours"
# They claim "this is bigger than that other thing"
# That claim is reliable. The hours claim isn't.

Application: Use story points for planning. Track team velocity (points per sprint). Use velocity for roadmapping, not for individual task accuracy.

The Real Bottleneck: Unknowns

Core: Estimation isn’t really the problem. Unknowns are.

Detail: When the payment retry task took 18 hours instead of 8, the extra 10 hours were unknowns: “I didn’t know the API behaved this way” (5 hours), “I didn’t know this would interact with billing” (3 hours), “I didn’t know this edge case existed” (2 hours).

These unknowns are hard to estimate because they’re unknown. You can’t know what you don’t know.

The real solution: reduce unknowns before estimating. Spend time researching: “How does the payment provider API actually work?” “What systems does payment touch?” “What edge cases exist in current payment logic?”

After 2-3 hours of research, suddenly the unknowns shrink. The estimate becomes more accurate not because estimation improved, but because unknowns became known.

The implication: if management rushes you to estimate before you’ve researched, the estimate will be fiction. Push back on that.

The Broken Incentive: Estimates as Promises

Core: Estimation breaks when estimates become promises.

Detail: Early in my career, I’d estimate “3 days” and the manager would say “Great, 3 days for feature X.” Six days later, I’d explain it took longer. The manager would respond, “But you estimated 3 days.”

The estimate had become a promise. I’d missed my “commitment.” This was backwards. The estimate wasn’t a promise—it was a guess. But the incentive system treated it as a promise.

When estimates become promises, engineers start padding them. “I’ll say 10 days knowing it might take 5, so I’m covered.” Padding wastes time and hides actual efficiency.

The real question isn’t “do you promise to deliver in 3 days?” It’s “what’s your best guess with what you know now?” and “what are the unknowns that could change that guess?”

Application: Never use estimates as performance metrics. “You estimated 3 days, missed it, therefore you’re bad at estimating” is backwards. Treat estimates as forecasts, not promises. Measure performance on outcome (did we solve the problem?), not adherence to estimates (did we hit the 3-day guess?).

What I Wish I’d Done Differently

Core: I spent years trying to estimate accurately. It was wasted effort.

I should have:

  1. Stopped trying to estimate novel work with hours
  2. Invested in research to reduce unknowns before estimating
  3. Used story points for relative sizing
  4. Tracked velocity to calibrate future estimates
  5. Communicated that estimates are forecasts, not promises

The companies that worked best were those that accepted estimation uncertainty and planned around it. “We think this is 3 weeks. If we encounter unknowns, it could be 4-5 weeks. Let’s start and recalibrate as we learn.”

The companies that failed were those that tried to eliminate estimation uncertainty through oversight. “Estimate 3 weeks, hit 3 weeks, good.” That required either overestimating (padding) or discovering the work was already finished 2 weeks in but hiding it.


Hero Image Prompt: “Estimation accuracy visualization showing broken accuracy models. Left side: hour estimates with wide miss ranges (estimated 8 hours, actual 18 hours). Center: comparison with story points and velocity-based planning showing convergence over time. Right side: successful roadmap using velocity instead of accuracy. Include graph showing velocity stabilizing over sprints. Dark professional theme with red (missed estimates) and green (velocity-based planning working) zones.”