Thoughts on Mik Kersten’s Flow Framework

I read Mik Kersten’s helpful book Project to Product shortly after he published it, and I didn’t comment on it at the time. However, a colleague recently asked about The Flow Framework contained therein, so I am now sharing some thoughts on it.

In general, anytime someone promotes the principles of flow management and thinking — especially when he or she is well received for it — I am grateful. That’s largely the case with Kersten’s book. Insofar as it has increased awareness and even the practice of flow management, I commend it. In the interest of giving credit where it’s due, I offer just a few words of potential clarification to say simply that the flow metrics that Kersten advocates are not new but simply repackaged — skillfully and helpfully, to be sure — versions of what many people have been advocating for a while, most notably those in the kanban community.

To that end, here’s a guide to mapping the elements of The Flow Framework with those that they are based on. I don’t really find much added value in the terms that Kersten uses, and I actually find Flow Velocity rather unhelpful, since it conflates a term that’s already well-known in agile circles with something that we already had a perfectly useful word for.

Flow Framework MetricDescriptionOtherwise Known AsDescription (from Essential Kanban Condensed when possible)
Flow DistributionMutually Exclusive and Comprehensively Exhaustive (MECE) allocation of flow items in a particular flow state across a measure of time.Work-Item Type DistributionWork-type mix/distribution (i.e., % allocation to work types)
Flow VelocityNumber of items done in a given time.Throughput/Delivery RateThroughput: The number of work items exiting a system or sub-system per unit of time, whether completed or discarded.
Delivery Rate: The number of work items emerging complete from the system per unit of time.
Flow TimeTime elapsed from when a flow item enters the value stream to when it is released to the customer.Delivery Time/Time to Value/Time in Process/(System) Lead TimeThe elapsed time it takes for a work item tomove from the commitment point to the delivery point. Informally,or if qualified, it may refer to the time it takes to move through adifferent part of the process.
Flow LoadNumber of flow items with flow state as active or waiting (i.e., work in progress).Work in ProgressThe work items that have entered the system or state under consideration, but that have not yet been either completed or discarded.
Flow EfficiencyThe proportion of time flow items are actively worked on to the total time elapsed.Flow EfficiencyThe ratio of the time spent working on an item (Touch Time) to the total Time in Process.

Kanban Myths

[Editor’s note: I was surprised to realize that I had never written my own list of kanban myths, probably because so many other good ones exist. But here’s my take.]

Like the one about George Washington, myths about kanban persist long after evidence to the contrary. Before I list some of them, I’ll start with a few connotations of kanban, since I think that most of the myths stem from a misunderstanding of kanban. Of course, I’ll happily work to understand your own connotation of kanban. As for me, when I talk about kanban, while I respect all of the following connotations, I really consider only the first two to be valid and the latter two to be shallow or incomplete:

  • (Valid) A pull system enabled by virtual kanbans which signal capacity and create flow, reduce overburdening and yield other benefits. 
  • (Valid) A method for defining, managing, and improving services that deliver knowledge work, such as professional services, creative endeavors, and the design of both physical and software products (a.k.a., Kanban Method).
  • (Shallow or incomplete) A work board (analog or digital) that makes knowledge or intangible-goods work visible. Since the intangible-goods context derives from the physical kanban system of tangible-goods manufacturing, though, the kanbans are really virtual in these environments. The tickets on these boards are signals of demand, not capacity (which is what kanbans indicate). Therefore, any board that doesn’t have pull signals (indicated either by an explicit work-in-progress limit or a space-limited area) really shouldn’t be called a kanban board. It’s probably only a work-visualization/management board or, simply “work board.” At best, I would consider this a “proto-kanban” board.
  • (Shallow or incomplete) Continuous-flow-based delivery or timebox-decoupled delivery cadences, or “Scrum without Sprints.”

All right — on to those myths of kanban, or what kanban is not:

  • Kanban is advanced agile, and it’s better to start with Scrum: See Don’t mistake adoption patterns for maturity patterns.
  • Kanban needs all work items/user stories need to be similarly sized: I’m not sure how this one got started, but it’s simply untrue. What about a pull system necessitates that the work be the same size? (I won’t even ask how you know that your non-repetitive work is even of a knowable size in the first place.)
  • Kanban teams don’t have a sense of urgency: If by “sense of urgency” you mean that at the end of a sprint, everyone rushes to complete tickets, resulting in overburdened workers, shoddy quality and unintegrated work, then it’s true that kanban doesn’t create this sort of environment. If you mean clarity of expectations (in the form of service-level expectations and explicit policies), guidance on how to handle work based on a customer’s needs (in the form of classes of service), and a focus on flow that enables getting things done, then this claim is false.
  • Kanban teams can do whatever they want and don’t require discipline: I find it to be just the opposite in reality. Teams that otherwise fancy themselves disciplined and/or “agile” often somehow can’t summon up the basic discipline to implement WIP limits to create a simple pull system. Teams who use Kanban Method are by definition continuously improving, because they “agree to pursue improvement through evolutionary change.”
  • Kanban is only for x work (where x usually means ops or support): Kanban allows us to see how our work works, creates flow and reduces overburdening. If you have work for which you don’t desire those benefits, then this might be true. But why would this myth be perpetuated? Perhaps because kanban originated in repetitive-task work environments (e.g., automobile manufacturing), unimaginative people assumed that it couldn’t be applied to the modern unique product development world (another flavor of mistaking adoption for maturity patterns). The question I usually ask is “What about kanban leads you to believe it can’t be applied in your current context?”
  • Kanban is useful only at the team level: Flight Levels demonstrates and the Kanban Method elaborates that wherever work is being planned and delivered, kanban is applicable. As Klaus Leopold writes, “teams are usually part of a larger organizational context, which means that the topic of WIP limits should also be understood in this larger context. If the overall company performance — the business agility — should improve … you must limit the work where you want the effect, the value and the benefits of WIP limits to be realized.”
  • Kanban and Scrum are mutually exclusive: Depending on what you’re referring to, this one may be only half wrong. It’s true that kanban is born out of a different paradigm and philosophy (e.g., pull vs. push, adaptive and evolutionary rather than prescriptive). But kanban in the pull-system sense is certainly something that many Scrum teams benefit from. And taking the Kanban Method as the “start where you are” approach to improvement, if your starting point is Scrum, you can certainly design a kanban system to wrap over it and improve service delivery.
  • There is no commitment in kanban: Well, technically, there’s no “commitment” in Scrum, if you want to go down that road. Again, it comes down to what you mean by “commitment” and of course what you mean by “kanban.” I suspect that people who make this claim aren’t really using kanban systems but rather define kanban by one of the two shallow connotations above. In reality, kanban allows teams to make informed forecasts and improves predictability, thereby allowing everyone in the system to make better and more-informed commitments.

See also:

Project Concerns vs. Customer Concerns

The traditional “iron triangle” of project management often crowds out our thinking about what we should really be focusing on in product delivery.

Although things like scope (count of functionalities, features), cost and time seem easiest to measure — and therefore become the things we measure and elevate — they really have little to do with product success. Rather, customer concerns like return on investment, fitness for purpose and sentiment are more important things to measure if we are interested in building great products — and they’re actually not as difficult to measure as we might have previously thought (if you’re not convinced, see Douglas Hubbard’s How to Measure Anything).

Image credit: Yoan Thirion

The Eight Stances of a Transformational Leader

Inspired by Barry Overeem’s 8 Stances of a Scrum Master, I have been talking about (and will again at the upcoming Agile Kanban Istanbul and UnlimitedAgility conferences) the eight stances of a transformational leader. I’ll be publishing my own white paper with a fuller explanation, but for now here’s a snapshot.

First, I use the term stance in the sense of “a mental or emotional position adopted with respect to something.” So it’s not a title or a role, but a way of being in a particular context. By transformational leader, I mean simply anyone working in a VUCA environment who, as Amy Edmondson says, plays a role in creating and nurturing the culture we all need to do our best work.

  • Organizational Refactorer: Like refactoring in programming, refactoring is a technique for restructuring an existing work environment by altering its internal structure without changing its external behavior. It’s the act of making small-J-curve, kaizen changes to reduce friction (caused by accidental complexity) and make it easier for people to do their jobs. As the goal of refactoring code is clean code, the goal is a “clean” organization.
  • Strategy Deployer: This is the act of setting direction and providing clarity of mission to foster organizational improvement in which solutions emerge from the people closest to the problem. The goal is aligned autonomy and a leader-leader culture.
  • Anzeneer: This portmanteau coined by Josh Kerievsky means “safety engineering.” That is, we need to protect people by establishing psychological and physical safety in everything from relationships to workspaces, codebases to processes, products to services.
  • Coach: Coaching is teaching others to be leaders and building an organization that can sustain its success. As Jeffrey Liker says, leaders are responsible for creating an environment in which future leaders can blossom.
  • Environmentalist: An environmentalist holistically (re)creates and stewards the environment in which people grow, with particular awareness of context or organizational “terroir.” Just like a winemaker will fail if he or she tries to grow a certain grape varietal in a place that is very different from another, so too will an organizational leader fail if he or she attempts cookie-cutter solutions or to install frameworks or initiatives irrespective of culture or context.
  • Experience Designer: Similar to the experience design of products, this is means consciously creating meaningful interactions centered on the employee, with particular focus on intrinsic motivation (mastery, autonomy, purpose); “alleviating people’s problems and bringing them joy.”
  • Experiment Curator: An experiment curator models and celebrates behaviors and creates the environment for employees to learn and share learning through experimentation. It’s basically creating a place for others to learn in.
  • Flow Manager: Taken from the Kanban Method, flow management means to optimize the end-to-end flow of value in a system. Leaders need to make visible and look after wait flow as much as work flow, actively reducing dependencies and managing the system for smooth, fast flow, rather than utilization.
StanceIn three words
1. Organizational RefactorerReduce accidental complexity
2. Strategy DeployerLeader-Leader culture
3. AnzeneerMake safety prerequisite
4. CoachEnable over do
5. EnvironmentalistPassion for “terroir”
6. Experience DesignerDesign for engagement
7. Experiment CuratorFoster learning culture
8. Flow ManagerOptimize the whole

Project-to-Product Principles

I’ve increasingly been helping organizations looking to go from “projects to product,” so I’ve curated/co-created/stolen a few principles for anyone who wants a tl;dr version of how to go about it. I’m indebted to the work of Jez Humble, Marty Cagan, Matt Lane and John Cutler.

In moving toward product-orientation, we prefer:

  • Outcomes over outputs
  • Solving problems over building solutions
  • Options over requirements
  • Experiments over backlogs
  • Hypotheses over features
  • Customer-validated learning over PO assumptions
  • Measure value over measure cost
  • Flow over utilization
  • Optionality over linearity
  • Product vision, strategy, personas and principles over product roadmaps
  • Small-batch delivery over big-batch delivery
  • Optimizing for assumptions being wrong over optimizing for assumptions being right
  • Engineers solving problems over POs dictating requirements
  • Teams of missionaries over teams of mercenaries
  • Business-driven over IT- or PMO-driven

Sticking to the Plan is Not the Solution

[Note: To commemorate the agile manifesto‘s 20th anniversary, this is the first of 12 posts in no particular order on the manifesto’s principles.]

Welcome changing requirements, even late in development. Agile processes harness change for the customer’s competitive advantage.

Manifesto for Agile Software Development

Changes to the plan mid-sprint. Constantly-changing priorities. Those words strike fear into the hearts of many team members. One popular agile-health assessment even downgrades as immature those teams whose “plan changes frequently.”

The agile manifesto addresses the idea of change, but it’s not to fear it or to view it as a sign of immaturity — rather the opposite: We are to welcome change and value responding to it over following a plan. So what’s going on?

I think this is a case of confusing the symptom with the problem.

It’s true that constantly changing priorities are often a smell of disorganized product management or unclear strategy. But those dysfunctions need not create additional ones. And it’s quite possible that those changing requirements actually, you know, create competitive advantage. Far too many teams and organizations blame “changing priorities” for their own inability to deliver quality product. I often see diagnoses of teams having difficulty delivering with prescriptions like “Stick to the sprint plan.” Changing requirements is not the problem, so sticking to the plan is not the solution.

The problem is not that the plan changes but that the team or organization lacks the capability to work in such a way as to accommodate change easily.

If changes to the  plan result in any of the following things, you’re doing it wrong — and sticking to the plan isn’t going to help you improve:

  • Taking technical shortcuts
  • Working overtime
  • Getting stressed out
  • More defects (leading to the anti-pattern of defect sprints)

I understand why teams and consultants instinctively reach for the “stable plan” panacea: Teams commit to a slate of work, whether it’s a sprint, iteration or program increment, which they’ve often spent time estimating. Stakeholders (including well-meaning scrum masters) exert some amount of pressure on them to deliver what they’ve committed to, heedless of the nature of knowledge work (i.e., complex). Even when external pressure is low, the very design of a sprint or PI plan implies that success correlates to following the plan; I have encountered many teams who suffer low morale simply because they delivered something different from what they planned, whether it was because they weren’t able to estimate complex work or because a stakeholder changed midway through the plan.

And of course, teams are often held accountable for others’ — product management’s or HiPPOs’ — decisions. Sometimes, it’s because stakeholders behave badly: They are aware that their demand creates stress and pain on the team and refuse to acknowledge and respect a team’s finite capacity — they pile on tasks but refuse to remove any. But often, stakeholders do it out of benign ignorance, lacking a feedback mechanism to inform their choices (or, as Don Reinertsen calls it, “an economic framework”) and assuming that the team will figure out how to handle it. Therefore, it is incumbent upon the team to work in ways so as to accommodate change more comfortably.

However, the solution often put forward is to “stick to the sprint plan.” This flies in the face of the manifesto value to prefer Responding to Change to Following a Plan. Clearly, when a stakeholder makes a mid-plan reprioritization decision, it’s because of a real or perceived business need (a.k.a. “customer’s competitive advantage”). The question of whether it actually does provide a competitive advantage is secondary here. The team has an obligation to “welcome this changing requirement” whenever it happens. In what universe it is a good business decision to follow through on building a feature that we determine customers don’t want simply because we need to follow a plan?

People don’t really have a problem with changing requirements; we have a problem with being prevented from doing quality work. So the problem isn’t that we don’t want new information but rather how we accommodate it.

What makes it difficult to do that? Many things contribute, but I’ve seen two in particular:

  • Too much work in progress
  • Not enough investment in technical practices

The good news is that both of these things are typically within the team’s control.

Too much work in progress

Regardless of whether you work in a timebox (sprint, iteration, PI), your team — as well as you as an individual — should have control over how many things you work on at once. Just because you’ve “committed” to a plan of delivering 20 work items doesn’t mean you need to start all of them at once. That means stopping the inane practice of having individual team members sign up for work items (or worse, having someone assign them) before they start. Limit the number of in-progress work items to something less than the total number of people in the team. (Reinertsen again: “We will always conclude that operating a product development process near full utilization is an economic disaster.”)

This pattern replicates itself at the enterprise level. Organizations are unable to respond to big or urgent demands without significant disruption because they are saturated with “projects.” Most executives that I work with don’t even know how much organizational WIP they have, so for them, making it visible is the first step. But the reality is that when plans become promises and success is measured by conformance to them, we effectively lock in the largest possible batch size, which only exacerbates the problem of dealing with change (and of course is one of the worst things an organization can do).

Not enough investment in technical practices

Talking about XP practices seems so … 2000s. And yet I think that our industry has gotten worse, if anything, with respect to building quality in through things like TDD, pairing, loose coupling, emergent design and continuous integration. Here’s the thing about engineering practices and those stakeholders who are making those changing requirements: They probably don’t have a clue about the practices, and that’s okay. Once you pick up a story, take the time to do it right. The team’s definition of done should include things like proper testing, integration (maybe even deployment!). If it means that the story takes more than two weeks, that’s okay, too. Teams that don’t do these things are reasonably afraid to change the plan because they can’t safely undo (see long-lived branches) or share work (see “not my code”). But here’s the reality: If you avoid doing these things, delivery will never get easier. And, of course, the investment in technical practices will only happen if you limit work in progress.

Other options for better accommodating change:

  • Stop planning in artificial time boxes (whether at the sprint or quarterly level) and instead plan just-in-time using pull (e.g., replenish the Next column based on capacity signals).
  • Stop punishing teams and individuals for not completing their sprint or release “commitments,” including the tracking of the ridiculous sprint-completion rate and other “success” metrics based on conformance to a plan.
  • Spend less time estimating work, as it represents a sunk cost that inhibits change accommodation (and likely has very little information value anyway).
  • Show the cost of interruption. If we’re concerned about undisciplined product owners making careless decisions, make the cost of the decision visible. We can do this with simple flow metrics and post-commit discard rates. If we’re tracking delivery times, we can easily see the “flow debt” that we incur by pausing or dropping work to tend to expedite requests. (Typically this is apparent on a delivery-time frequency chart, manifesting as a bimodal distribution.)
  • Use and make highly visible policies that show how you treat work, such as the selection criteria the team uses to pull work items. For instance, some executives, if they knew that their requests were being treated as “drop everything and work on this now,” would think twice about the requests (I’ve met a few).

The bonus is that if we solve the problem rather than the symptom, we’ll actually save time by avoiding daft things like trying to get better at estimating, debating “carryover” stories and coming up with complicated prioritization schemes (just FIFO it). If we’re able to accommodate change gracefully, those remnants of the agile industrial complex go away.

I would even go so far as to aver that being able to accommodate change mid-sprint is actually a fantastic “agile health” metric. After all, if you can’t do that, you won’t be able to do anything very well, whether it’s scaling, product pivoting or increasing throughput, quality or speed. Don’t hide the problem by requiring the team to “stick to the plan.” Fix the underlying issues so that the team can support the organization in what it’s trying to do, which is to exploit variability for economic benefit. Isn’t that the whole idea of business agility?

Flight Levels and Aligned Autonomy

Flight Levels is a thinking model for organizational improvement. As Klaus Leopold says, it “helps you find out where in an organization you have to do what in order to achieve the results that you want.” Flight Levels is effective at that because it stresses the idea of leverage and coherence across the multiple strata and teams of an organization.

In doing so, it provides a way to model and marry strategy development and strategy deployment, effectively fostering a way for leadership at every level to take root and flourish. I’ve spoken about this connection previously, but, until now, I hadn’t fully connected the dots between how Flight Levels can be combined with the “aligned autonomy” matrix to provide a useful way to visualize where an organization is and how Flight Levels can help create a path toward aligned autonomy.

Aligned autonomy is the idea that, rather than the conventional view that alignment and autonomy are opposites, they are actually separate concerns. The insight is credited to the 19th-century Prussian military leader Helmuth Von Moltke and popularized in modern times by Stephen Bungay, who wrote in The Art of Action that

… there is no choice to make (between alignment and autonomy). Far from it, [Von Moltke] demands high autonomy and high alignment at one and the same time. He breaks the compromise. He realizes quite simply that the more alignment you have, the more autonomy you can grant. The one enables the other. Instead of seeing of them as the end-points of a single line, he thinks about them as defining two dimensions.

More recently, Henrik Kniberg, in his inimitably accessible style, expanded on the concept by describing the organizational culture types of each quadrant:

In the organizations with which I have worked, the elements of the Flight Levels tend to manifest themselves in particular ways that align to these quadrants:

Let’s take each one in turn.

Authoritarian-Conformist

This quadrant is where highest-level leaders make little to no distinction between “what and why” and “how.” This is the realm of top-down decision-making, going too far into detail about implementation and often handcuffing the people doing the work from making better decisions about execution because of a misguided centralized authority. Meanwhile, because of this obsession with the “how,” these leaders are often derelict in their duty to work “up a level” in strategy, where their contributions are most needed. As a result, these are leader-follower organizations, in which people are trained not to take action and have to ask permission for any decision of importance.

We might depict their Flight Levels as overlapping or compressed at one level; that is, people who should be developing and evolving strategy are too concerned with the day-to-day operations of teams. To use the flight metaphor, these are people who who be flying at the airplane level but can’t remove themselves from the butterfly-level details. Or, as Bungay explains: “Far from overcoming it, a mass of instructions actually creates more friction in the form of noise, and confuses subordinates because the situation may demand one thing and the instructions say another… trying to get results by directly taking charge of things at lower levels in the organizational hierarchy is dysfunctional.”

Von Moltke saw the same behavior in the military:

In any case, a leader who believes that he can make a positive difference through continual personal interventions is usually deluding himself. He thereby takes over things other people are supposed to be doing, effectively dispensing with their efforts, and multiplies his own tasks to such an extent that he can no longer carry them all out.

The demands made on a senior commander are severe enough as it is. It is far more important that the person at the top retains a clear picture of the overall situation than whether some particular thing is done this way or that.

These organizations find it difficult to scale effectively, because their leadership’s inattention to strategy and intrusive concern with implementation details creates a passive leader-follower culture.

The challenge for these organizations then is to use Flight Levels to encourage higher-level leaders to begin to distinguish between the “what and why” and “how,” and focus on setting “directionally correct” strategy while trusting teams and Level 2 Coordination to execute.

Micromanaging-Indifferent

In this quadrant, the concerns of operations, coordination and strategy are variously overlapping, disconnected and/or non-existent. Here we observe:

  • Rampant and invisible WIP
  • Low employee engagement
  • No clear org vision/strategy
  • Siloed, undiscoverable tools
  • Tribal, network-based knowledge
  • Busy but unproductive people
  • Redundant, unshared work

Work in these organizations is perhaps best described by Barry O’Reilly when he says that “When people lack clarity they will optimize for what is in their control, output that is attainable to them but not necessarily the outcomes you want to produce.” To the extent that any measurements exist, activity-based metrics reign here.

The challenge for these organizations is perhaps to simply acknowledge the possible existence of Flight Levels and their relationship to each other. The simple but daunting task of making work visible is a necessary first step.

Entrepreneurial-Chaotic

This is the realm of disconnected teams. They have broad autonomy but little awareness of their relationship to strategy and often of their relationship to each other and the wider end-to-end value stream. In some cases, they do have their own Level 3 Strategy, but they are not unified to a common organizational strategy; they function more as warring fiefdoms under a single name. Sometimes, this organizational culture is the outcome of growth, as we might see in the progress of a startup to a scale-up, in which leadership hasn’t commensurately matured with the new needs of the organization. But it can also occur in the context of a bloated Authoritarian-Conformist organization, whose strictures are too unwieldy to control and where leaders with some authority attempt to break free, making their own plans because it’s the only way they can get work done (e.g,. grey market of tools). In both cases, the work is disconnected from strategy. The organization lacks an ability to see itself from the 30,000-foot view.

People in these organizations generally make lots of decisions on their own, until the decision is somehow related to understanding strategy. Since leadership either keeps strategy closely held or, as is more often the case, doesn’t really have a strategy, this can cause tension, frustration and disengagement, as connection with higher-level purpose is missing. This often extends into career development, as well.

The challenge for organizations in this quadrant is to instantiate Level 2 Coordination and Level 3 Strategy. Starting points can be to identify desired organizational outcomes (Level 3), shift attention to end-to-end metrics (Level 2), make work visible and to use yokoten (lateral deployment) to create awareness.

Innovative-Collaborative

This is the Leader-Leader ideal, which is fostered by clear delineation of concerns at the operational, coordination and strategy levels of Flight Levels. The lightweight but comprehensive modeling of these concerns in Flight Levels provides enough separation of “what and why” from the “how” for people to act autonomously but aligned toward the organization’s desired outcomes:

  • Intent is expressed in Level 3 Strategy in terms of what to achieve and why.
  • Autonomy in Level 1 Operational gives freedom of the actions taken in order to realize the intent; in other words, about what to do and how.

The “just-enough” strategy ensures that empowered teams and individuals are working on the right things, not merely working on things the right way. As one neutral observer of Von Moltke’s manifestation of the Innovative-Collaboration organization noted:

Every German subordinate commander felt himself to be part of a unified whole; in taking action, each one of them therefore had the interests of the whole at the forefront of his mind; none hesitated in deciding what to do, not a man waited to be told or even reminded.

— Art of Action

Afterward

It’s important to note that many organizations aren’t monolithically characterizable in one single quadrant, nor do they always manifest in one quadrant over time. That is, certain groups in an organization may be Entrepreneurial-Chaotic, while others are Authoritarian-Conformist. Or while they may generally be Authoritarian-Conformist, they have moments when they exhibit Entrepreneurial-Chaotic.

As a result, it’s helpful to pay attention to those specific behaviors and move in the general direction toward aligned autonomy using Flight Levels, realizing the the organization may change in fits and starts.

How to Forecast Before You Even Start

One question that people who are friendly to the probabilistic-forecasting mindset often ask is “I understand how to forecast with delivery data once the project is underway, but how do I forecast before I even start?” Assuming that you absolutely need to know that answer at that point — heed Douglas Hubbard’s advice* — a simple probabilistic way to do it is through reference-class forecasting. Conceptually.org has as good a definition of it as anyone:

Reference class forecasting is a method of predicting the future by looking at similar past situations and their outcomes. Kahneman and Tversky found that human judgment is generally optimistic due to overconfidence and biases. People tend to underestimate the costs, completion times, and risks of their actions, whereas they tend to overestimate the benefits. Such errors are caused by actors taking an “inside view”, assessing the information readily available to them, and neglecting other considerations. Taking the “outside view” — looking at events similar to ours, that have occurred in the past — helps us reduce bias and make more accurate predictions.

An easy metaphor for reference-class forecasting is home sales. We’re trying to forecast something that’s complex — lots of market dynamics involved in something that’s essentially never been done before (how much someone will pay for this particular house at this particular point in time). We use a variety of economic and housing data — zip code, square footage, construction, features — to create a reference class of “comparables.” (If you really want to geek out, see Zillow’s Forecast Methodology.)

Most organizations have delivered some number of projects — maybe not this exact project in this exact tech stack with this exact team, but with attributes that are comparable to it.

An Example

Here’s an example. A company was considering a new initiative. They needed to know approximately how long it would take (time being a proxy for cost but also market opportunity cost). They took the traditional inside-out approach — attempting to predict how long something will take by adding up all known constituent tasks — and estimated it at about a year. This inside-out approach being subject to the Planning Fallacy, we decided to also try a reference-class forecast.

  1. We took a list of the 50 most recent projects, going back a few years. We needed only a pair of dates for each one: When the business officially committed to the project (confirming this commitment during the “fuzzy front end” is often the most tricky bit) and when it went to production.
  2. We then categorized each project by meaningful traits, like project type (legacy or greenfield), team size (small, medium, large) and dependencies (many or few).
  3. We viewed the data on a scatterplot chart.

Unfiltered Forecast

You’ll always have a tension between needing enough data (small sample sizes can be distortive) and relevant data. The good thing about reference-class forecasting is that it’s inexpensive (and better) to run multiple views. First, we ran an unfiltered forecast — all 50 projects.

Unfiltered reference-class forecast

This yielded a high-level view onto how long projects take overall. Half of the time (50th percentile), projects finish in 383 days, or a bit more than a year. But that leaves a lot of projects — the other half! — that take longer. How much longer depends on the level of confidence we seek:

  • 50% of the time: in 383 days (a little more than a year)
  • 70% of the time: in 509 days (1.4 years)
  • 85% of the time: In 698 days (nearly 2 years)

Filtered Forecasts

Of course, this new project will be different (as always!), so not all of those projects are really relevant. So we filter based on characteristics similar to the project we’re forecasting: It’s a legacy project, with a small team and many dependencies. We had such 12 projects in that reference class. Its confidence intervals are indeed different from those of the entire set:

a filtered reference-class forecast
  • 50%ile: 607 days (1.7 years)
  • 70%ile: 698 days (1.9 years)
  • 85%ile: 776 days (a little more than 2 years)

Those numbers were larger than the whole set, so that was disappointing. Maybe we can look at the problem a bit differently. What about legacy projects with many dependencies that were staffed by medium-sized (rather than small) teams. (Perhaps what Troy Magennis said about reducing the effect of dependencies with slightly larger teams was right!) We had 11 such projects:

Wow! That is quite a different story. I guess Troy was right!

  • 50%ile: 303 (less than a year)
  • 70%ile: 400 (a little more than a year)
  • 85%ile: 509 (1.4 years)

We now have three different reference-class forecasts to use. They at least give us some options to inform our thinking (especially as regards team-sizing decisions). Knowing which reference class to use is more art than science, so I like to consider a few options rather than locking into one (especially the one that paints the rosiest picture!).

Once we do get started with the project, we will of course want to do probabilistic forecasting with actual “indigenous” delivery data. But before we even start, we have informed ourselves from the outside-in — averting the Planning Fallacy — on when we might expect this particular new initiative to be done.

* Hubbard says essentially “Of course you need to estimate development costs when making a decision about IT investments. However, you don’t usually need to reduce the uncertainty about those costs to make an informed decision. Reducing the uncertainty about the utilization rate and the likelihood of cancellation of a new system is much more important when deciding how to spend your money.”

Strangler Pattern… for Estimating?

Martin Fowler long ago popularized the metaphorical “Strangler Pattern” (since updated to “Strangler Fig Pattern”) as a more graceful and less risky way to rewrite an existing system. He wrote of the Australian strangler fig plant:

They seed in the upper branches of a tree and gradually work their way down the tree until they root in the soil. Over many years they grow into fantastic and beautiful shapes, meanwhile strangling and killing the tree that was their host. This metaphor struck me as a way of describing a way of doing a rewrite of an important system… An alternative route [to all-or-nothing big-batch replacement] is to gradually create a new system around the edges of the old, letting it grow slowly over several years until the old system is strangled.

When introducing organizations to probabilistic forecasting — which I simply describe as answering the question “When will it be done?” with less effort and more accuracy — the move from traditional estimating can often seem like a similar problem to that of Fowler’s legacy application swap: It’s a big change, fraught with risk, affects a lot of people, and we’re not entirely comfortable with or sure about how it works.

For these reasons, and because most sustainable change is best effected via gradual, safe steps, I guide teams to apply what is essentially the strangler pattern: Keep what you have in place, and simply add some lightweight apparatus around the edges. That is, continue to do your legacy estimating process — estimation meetings, story points, fibonacci numbers, SWAG and multiply by Pi, whatever — and alongside that, let’s start to track a few data points, like commit and delivery dates.

Kanban Method encourages us to “start with what you do now,” and one of the benefits of this approach (besides being humane and not causing unnecessary emotional resistance) is that it helps us understand current processes. It’s quite possible that a team’s current estimating practice “works” — that is, it yields the results that they’re seeking from it. If that goal is to provide a reliable sense of when something will be done, doing the simple correlation of upfront estimate to actual elapsed delivery time will answer that question (spoiler: Most teams see little to no correlation). That in itself can help people see whether they need to change: It’s the Awareness step of the ADKAR model. Continuing existing practice while observing it also helps us decouple and filter out the stuff that is valuable, such as conversation that helps us understand the problem and possible solution. NoEstimates, after all, doesn’t mean stopping all of the high-bandwidth communication that happens in better estimating meetings.

Meanwhile, we’re collecting meaningful data — the no-kidding, actual dates that show the reality of our delivery system. These are the facts of our system, as opposed to how we humans feel about our work, and as one observer famously noted, “Facts don’t care about your feelings.” But facts and feelings can “peacefully” live alongside each other for a time, just as the strangler fig and host tree (before the fig kills the host, of course). You can then start running Monte Carlo-generated probabilistic forecasts in the background, which allows you to compare the two approaches. If the probabilistic forecast yields better results, keep using it and gradually increasing its exposure. If for some reason, the legacy practice yields better results, you may choose to “strangle the strangler.” Most groups I work with end up appreciating the “less effort and more accuracy” of probabilistic forecasts, and after a time start asking “Why are we still doing our legacy estimating practice?” At that point, the strangler fig has killed the host, and all that remains is to safely discard the dead husk.

So to summarize the strangler pattern for estimating:

  1. Keep doing your legacy estimating practice.
  2. As you do, track delivery dates (commit point through delivery point).
  3. Run a correlation between the two sets of numbers (upfront estimates and actual delivery times).
  4. Continuously run probabilistic forecasts alongside the legacy estimates.
  5. Check the results and either keep the new approach or revert to the legacy.

As with most knowledge work in a VUCA world, whether it’s coding a new system or introducing new ways of working, reducing batch size — of which the Strangler Pattern is a type — offers more flexibility and reduced risk. If you’re interested in a better way of answering the question “When will it be done?” but need to do so incrementally and safely, the strangler pattern for estimating may be an idea to plant. (Sorry, couldn’t resist.)