How Many Runs Would Man City’s Seven Goals Have Been?

After Manchester City scored seven goals in their Oct. 14 match against Stoke City, my first reaction was: Wow, they’re playing some beautiful, unselfish soccer. Being also a baseball fan, my second reaction was: That’s a load of goals — how many runs would that equate to in baseball?

To find out, I used the same technique that we can use for understanding the performance and predictability of our knowledge-work systems, such as software delivery.

First, let’s look at the distribution of goals per team in soccer. Since the new English Premier League season has only just begun, I’ll use the data from 2016-17, the most recent complete season of play:
epl-histogram

From this we can then start to understand the likelihood of a seven-goal outburst by a single team. For instance, with 246 occurrences in a total of 760 total outcomes, the goal total of one is the most likely, at 32.4% Seven goals happened only once last year, making it 0.1% likely.

We can do the same for baseball. Let’s look at the runs scored per team for the entire 2017 regular season, which recently concluded:
mlb-histogram
Source: baseball-reference.com

(That 23-run game was when the Washington Nationals beat the Mets by a landslide on Apr. 30.)

To compare these outliers, we could use something like an average with standard deviations away from that. But the data from both the EPL and MLB are not normally distributed, which renders that approach inappropriate. Instead, we’ll use percentiles. Why? As Dan Vacanti writes in When Will It Be Done?:

Percentiles are not skewed by outliers. One of the great disadvantages of a mean and standard deviation approach (other than the false assumption of normally distributed data) is that both of those statistics are heavily influenced by outliers.

A percentile is simply a level that contains a certain percentage of data points. For instance, if I looked at the Premier League data at, say, the 61st percentile — the “one goal” column, that would mean that 60% of our outcomes were teams who scored one goal or fewer (the total percentages for zero goals (28.2%) and one goal (32.4%). We could even draw a curve that shows those numbers:
epl-histogram-percentiles
From the Premier League data, we see that the seven-goal outcome doesn’t happen until the 100th percentile, which makes sense because it was the highest-scoring outcome! We have to go all the way to the 100% percentile in terms of likelihood of possibilities to arrive at seven goals.
So where is the 100th percentile for baseball? Naturally, it will be the highest-scoring run total of the season:
mlb-histogram-percentiles
Now we have our answer! Seven goals, at least from recent data from the English Premier League, is equivalent to 23 runs in Major League Baseball.
Okay, so maybe that wasn’t all that interesting, since all we did was take the top outcome from each league. But using the same approach, we could develop a reference table for all of the scoring outcomes.
0% 60% 80% 90% 98% 99% 100%
MLB runs 0-4 5-6 7-8 9-11 12-14 15-22 23
EPL goals 0 1 2 3 4 5-6 7
Reading the table, you can make statements like:
  • In 60% of MLB and EPL games, a team scores six or fewer runs and one or fewer goals, respectively.
  • Seven or eight runs (or fewer) in baseball occurs at about the frequency as two (or fewer) goals in soccer.
We can apply this same approach to our delivery-time data in software delivery, because, like these professional sports, the data is not normally distributed. In fact the distribution of both leagues probably looks a lot like your team’s (graph it and see!). In knowledge work, as in this little exercise, we’re also trying to determine the probability of a single outcome happening, as in when we ask the question: “When might I expect this user story to be finished?” We can answer that question, and then plan, using percentiles, just like we did with the sports scores, like: “We have a 90% confidence that we’ll complete any given next user story in 11 days or fewer.” And like the sports scores, the longer the range in the “tail” the farther it pushes out our highest confidence intervals.

So the next time someone asks you about the likelihood of your favorite sports team — whatever the sport — scoring a certain number, you’ll know what to do — just as you will in your own team when someone asks when to expect a single piece of work to be finished.

Special thanks to Dan Vacanti for the insights from his recent book, When Will It Be Done?

Advertisements

Service-Delivery Review: The Missing Agile Feedback Loop?

I’ve been working for many years with software-delivery teams and organizations, most of which use the standard agile feedback loops. Though the product demo, team retrospective and automated tests provide valuable awareness of health and fitness, I have seen teams and their stakeholders struggle to find a reliable construct for an important area of feedback: the fitness of their service delivery. I’m increasingly seeing that the service-delivery review provides the forum for this feedback.

What’s the problem?

Software delivery (and knowledge work in general) consists of two components, one obvious — product — and one not so obvious — service delivery.  I’ve often used the restaurant metaphor to describe this: When you dine out, you as the customer care about the food and drink (product) but also how the meal is delivered to you (service delivery). That “customer” standpoint is one dimension of the quality of these components — we might call it an external view. The other is the internal view — that of the restaurant staff. They, too, care about the product and service delivery, but from a different view: Is the food fresh, kept in proper containers, cooked at the right temperatures, and do the staff work well together, complement each other’s skills, treat each other respectfully (allowing for perhaps the occasional angry outburst from the chef, excusable on account of “artist’s temperament”!). So we have essentially two pairs of dimensions: Component (Product and Service Delivery) and Viewpoint (External and Internal).
feedback-quad-chart.001
In software delivery, we have a few feedback loops to answer three of four of these questions and have more-colloquial terminology for that internal-external dimension (“build the thing right” and “build the right thing”):
feedback-quad-chart.002
The problem is that we typically don’t have a dedicated feedback loop for properly understanding how fit for purpose our service-delivery is. And that’s often equally the most vital concern for our customers — sometimes even more important than the fitness of the product, depending on whether that’s the concern of a delivery team or someone else. (One executive sponsor that I worked with noted that he would rather attend a service-delivery review than a demo.) We may touch on things like the team’s velocity in the course of a demo, but we lack a lightweight structure for having a constructive conversation about this customer concern with the customer. (The team may discuss in a retrospective ways to go faster, but without the customer, they can’t have a collaborative discussion about speed and tradeoffs, nor about the customer’s true expectations and needs.)

A Possible Solution

The kanban cadences include something called a Service-Delivery Review. I’ve been incorporating this to help answer teams’ inability to have the conversation around their service-delivery fitness, and it appears to be providing what they need in some contexts.
feedback-quad-chart.003
David Anderson, writing in 2014, described the review as:
Usually a weekly (but not always) focused discussion between a superior and a subordinate about demand, observed system capability and fitness for purpose Comparison of capability against fitness criteria metrics and target conditions, such as lead time SLA with 60 day, 85% on-time target Discussion & agreement on actions to be taken to improve capability
The way that I define it is based on that definition with minor tweaks:
A regular (usually weekly) quantitatively-oriented discussion between a customer and delivery team about the fitness for purpose of its service delivery.
In the review, teams discuss any and all of the following (sometimes using a service-delivery review canvas):
  • Delivery times (aka Cycle/Lead/Time-In-Process) of recently completed work and tail length in delivery-time distribution
  • Blocker-clustering results and possible remediations
  • Risks and mitigations
  • Aging of work-in-progress
  • Work-type mix/distribution (e.g., % allocation to work types)
  • Service-level expectations of each work item type
  • Value demand ratio (ratio of value-added work to failure-demand work)
  • Flow efficiency trend
These are not performance areas that teams typically discuss in existing feedback loops, like retrospectives and demos, but they’re quite powerful and important to having a common understanding of what’s important to most customers — and, in my experience, some of the most unnecessarily painful misunderstandings. Moreover, because they are both quantitative and generally fitness-oriented, they help teams and customers build trust together and proactively manage toward greater fitness.
feedback-quad-chart.004

Service-delivery reviews are relatively easy to do, and in my experience provide a high return on time invested. The prerequisites to having them are to:

  1. Know your services
  2. Discover or establish service-delivery expectations

Janice Linden-Reed very helpfully outlined in her Kanban Cadences presentation the practical aspects of the meeting, including participants, questions to ask and inputs and outputs, which is a fine place to start with the practice.


Afterward #1: In some places I’ve been, so-called “metrics-based retrospectives” have been a sort of precursor to the service-delivery review, as they include a more data-driven approach to team management. Those are a good start but ultimately don’t provide the same benefit as a service-delivery review because they typically don’t include the stakeholder who can properly close the feedback loop — the customer.

Afterward #2: Andy Carmichael encourages organizations to measure agility by fitness for purpose, among other things, rather than practice adoption. The service-delivery review is a feedback loop that explicitly looks at this, and one that I’ve found is filling a gap in what teams and their customers need.


Afterward #3: I should note that you don’t have to be in the business of software delivery to use a service-delivery review. If you, your team, your group or your organization provides a service of any kind (see Kanban Lens and Service-Orientation), you probably want a way to learn about how well you’re delivering that service. I find that the Service-Delivery Review is a useful feedback loop for that purpose.


[Edited June 12, 2017] Afterward #4 (!):  Mike Burrows helpfully and kindly shared his take on the service-delivery review, which he details in his new book, Agendashift: clean conversations, coherent collaboration, continuous transformation:

Service Delivery Review: This meeting provides regular opportunities to step back from the delivery process and evaluate it thoroughly from multiple perspectives, typically:
• The customer – directly, via user research, customer support, and so on
• The organisation – via a departmental manager, say
• The product – from the product manager, for example
• The technical platform – eg from technical support
• The delivery process – eg from the technical lead and/or delivery manager
• The delivery pipeline – eg from the product manager and/or delivery manager

I include more qualitative stuff than you seem to do, reporting on conversations with the helpdesk, summarising user research, etc


What is Fitness for Purpose?

[Note: Lately, I’ve been talking a lot about fitness for purpose and fitness criteria. Other than David Anderson and a few others, though, not much material exists — at least not applied in the software-delivery space — to point people to for further reading. So I’m jotting down some ideas here in the hopes of furthering the discussion and understanding.]

tldr;

  • The first step in improving is understanding what makes the service you provide fit for its purpose.
  • Fitness is always defined externally, typically by the customer
  • Fitness for purpose has two components: a product component and a service-delivery component
  • Fitness criteria are metrics that enable us to evaluate whether our service delivery and/or product is fit for purpose
  • Of the two major categories of metrics, fitness criteria are primary, whereas health or improvement metrics are derivative
  • Examples of service delivery fitness criteria are delivery time, throughput and predictability

Fitness for purpose is an evaluation of how well a product or service fulfills a customer’s desires based on the organization’s goals or reason for existence. In short, it is the ability of an organization or team to fulfill its mission. The notion derives from manufacturing industry that purportedly assesses a product against its stated purpose. The purpose may be that as determined by the manufacturer or, according to marketing departments, a purpose determined by the needs of customers. David Anderson emphasizes that

Fitness is always defined externally. It is customers and other stakeholders such as governments or regulatory authorities that define what fitness means.

Fitness criteria then are metrics that enable us to evaluate whether our product, service or service delivery is “fit for purpose” in the eyes of a customer from a given market segment. As Anderson notes, fitness criteria metrics are effectively the Key Performance Indicators (KPIs) for each market segment, and as such are direct metrics.

As Anderson explains,

Every business or every unit of a business should know and understand its purpose … What exactly are they in business to do? And it isn’t simply to make money. If they simply wanted to make money they’d be investors and not business owners. They would spend their time managing investment portfolios and not leading a small tribe of believers who want to make something or serve someone. So why does the firm or business unit exist? If we know that we can start to explore what represents “fitness for purpose.”

For me, fitness is something that, like user stories, can be understood at varying levels of granularity. Organizations have fitness for their purpose — “are we fit to pursue this line of business?” — and teams (in particular, small software-delivery teams) also have fitness for their purpose — “are we fit to delivery this work in the way the customer expects?”

Therefore, the first step in improving is understanding what makes the service you provide fit for its purpose. Fitness for purpose is simply an evaluation of how well an organization or team delivers what it is in the business of (its purpose). Modern knowledge-worker organizations like Asynchrony often focus on concerns like product development or technical practices, sometimes overlooking service-delivery excellence. But service delivery is a major reason why our customers choose us. That’s why we attempt to understand and define each project team’s purpose and fitness for that purpose at the project kickoff in a conversation with our customer representatives.

Two Components of Fitness

Fitness for purpose has two components: a product component and a service-delivery component. That is, the customer for your delivery team considers the product that you are building (the what) — did you build the right thing? — as well as the way in which you deliver it (the how) — how reliable were you when you said you’d deliver it? How long did it take you to deliver it? We have useful feedback mechanisms for learning about the fitness of the products we build (e.g., demos/showcases, usage analytics), but how do we learn about the fitness of our service delivery? That’s the service-delivery review feedback loop, which I will write about later.

Fitness Criteria

Fitness criteria are metrics which enable us to evaluate whether our service delivery is “fit for purpose” in the eyes of a customer from a given market segment. These are usually related to but not limited to delivery time (end to end duration), predictability and, for certain domains, safety or regulatory concerns. When we explore and establish expectation levels for each criteria, we discover fitness-criteria thresholds. They represent the “good enough” or the point where performance is satisfactory. For example, our customer may expect us to deliver user stories within some reasonable time frame, so we could say that for user stories, our delivery-time expectation is that 85% of the time we complete them within 10 days. We might have a different expectation for urgent changes, like production bug fixes.

Fitness criteria categories are often common — nearly everyone cares about delivery time and predictability, for instance — the actual thresholds for them are not. While some are shared by many customers, the difference in what people want and expect allow us to define market segments and understand different business risks. Fitness criteria should be our Key Performance Indicators (KPIs), and teams should use those thresholds to drive improvements and evolutionary change.

Who Defines Fitness?

As opposed to team-health metrics, like happiness or pair switches, fitness and fitness criteria are always defined externally: Customers and other stakeholders define what fitness means. That means you cannot ask the delivery team to define its fitness. They cannot know because they are not the ones buying their service or product. We should be asking customers “What would make you choose this service? What would make you come back again? What would encourage you to recommend it to others?”

These are a team’s fitness criteria and these are the criteria by which Asynchrony should be measuring the effectiveness of our teams’ service delivery. Then we’ll be improving toward the goal, the greater fitness for our purpose, both as an organization and as individual delivery teams. By integrating fitness-for-purpose thinking into everything we do, we will create an evolutionary capability that will help us sense changes in market needs and wants and what those different market segments value. As a result, Asynchrony will continue to thrive and survive in the midst of our growth and growing market complexity.

Difference Between Fitness Metrics and Health Metrics

Fitness Metric Health Metric
Metric that enables us to evaluate whether our product, service or service delivery is “fit for purpose” in the eyes of a customer from a given market segment. Effectively comprise the Key Performance Indicators (KPIs) for each market segment. Metric that guides an improvement initiative or indicates the general health of your business, business or product unit or service delivery capability.
Direct Indirect/derivative
Examples: delivery time, functional quality, predictability, net fitness score Examples: flow efficiency,velocity, percent complete and accurate,WIP
Customer-oriented and derived Team-oriented and derived

A Food Example

I like to use food for examples (also to eat). Is a restaurant in the product or service-delivery business? That’s a trick question, of course: The answer is “both.” As a customer, you care about the meal (product) but also about the way you have it provided (service delivery). And those always vary depending on what you want: If you want cheap and fast, like a burger and fries at McDonald’s, you may have a lower expectation for the product (sorry, Ronald) but a higher one for delivery speed. Conversely, if you’re out for fine dining, you expect the food to be of a higher quality and are willing to tolerate a longer delivery time. However, you have some thresholds of service even for four-star restaurants: For example, if you have a reservation, you expect to be seated within minutes of your arrival. And you expect a server to take your order in a timely way. If you don’t have a reservation, the maitre d’ or hostess will perhaps quote you an expected wait time; if it’s unacceptable, you’ll go elsewhere. If it’s acceptable but they don’t seat you in that time, you are dissatisfied. The service delivery was not fit for its purpose, which is to say the reason why you chose to eat there.

A Software-Delivery Example

The restaurant experience is actually not too dissimilar from software delivery. The customer expects software (product) but also expects it on certain terms or within certain thresholds (service delivery). A team works hard to deliver the right features and demonstrates them at some frequency; at the demo, the team likely will explicitly ask “is this what you wanted?” What’s often missing is the “are these the terms on which you wanted it?” Whether in the demo or a separate meeting, we need to also review service delivery. This is where we look at whether our service meets expectations: Did we deliver enough? Reliably enough? Respond to urgent needs quickly enough? The good news is that we can quantitatively manage the answers to these questions. Using delivery times, we can assess whether the throughput is within a tolerance. One team used a probabilistic forecast and found that their throughput was not likely to help them reach their deadline in time. Conversely, another realized that they were delivering too fast and could stand to reallocate people to other efforts. Also, for instance, when we set up delivery-time expectations (some people call these SLAs), like delivering standard-urgency work at a 10-day, 85% target, we can then make decisions based on data rather than feelings or intuition (which have their place in some decisions but not others). These expectations needn’t be perfect or “right” to begin; set them and begin reviewing them to see if they are satisfactory.

Having an explicit review of fitness criteria, especially for service-delivery fitness, is a vital feedback loop for improving. Rather than having the customer walk away dissatisfied for some unknown reason, we can proactively ask and manage those expectations and improve upon them. Often these are the unstated criteria that ultimately define the relationship and create (or erode) trust; discover them and quantitatively manage them.


Asynchrony’s First-Ever Internal Conference

Among the many exciting things happening at Asynchrony this year, one of my favorites is our first-ever internal conference, coming July 15. I’m a big fan of organizations that take time to learn and share their learning. Especially given that Asynchrony is growing and establishing new offices, it’s vital that we share learning across offices and invest in the personal relationships that make the organization what it is. The conference goals are:

  • Increase the value of the time invested by targeting information sharing.
  • Increase knowledge sharing and interactions between individuals and teams.
  • Provide opportunities for our employees to create and present a session for their colleagues.

The conference will be a mix of 50-minute sessions, an exhibit floor with 15-20 booths for delivery teams and functional groups (aka chapters and guilds) and open space. To fill the sessions, we made an open call for proposals in the organization, with a small selection team to decide which ones ultimately made the cut based on:

  • Good variety of information presented
  • Relevance to our current and future business success
  • Interest from the company in the presentation content (popular vote/survey)
  • Enough mix of technical and non-technical topics so there will be multiple sessions that non-technical people can attend and get value (this means that non-technical topics are probably more likely to be selected!)
  • Highlighting employees who have not already been featured in front of the company (expecting there to be a mix of both)
  • Promoting creativity of topic and presentation content/activities

We had around 40 people propose more than 50 sessions. The selected sessions are  intriguing — something for everyone, and certainly a conference I’m looking forward to attending!

  • Anarchism at Asynchrony: Lessons from the Left in Building Self-Organized Teams (Brian Coalson)
  • Asynchrony Culture and You! (Andrew Rauscher and Wes Ehrlichman)
  • Battling Unconscious Bias (Neem Serra)
  • Building a serverless backend on AWS (Eric Neunaber)
  • Denver: Self Management and our Future (Jim Mruzik and Don Peters)
  • DevOps Culture (Matt Perry)
  • Getting to know Node.Js (Josh Hollandsworth)
  • Go (Jason Riley)
  • Improving Communication Skills with Analogies and Metaphors (Rose Hemlock and J LeBlanc)
  • Intro to Unity 3d (Westin Breger)
  • Introduction to Functional Programming (Kartik Patel)
  • Mobile Monsters – Develop Your Mobile App Test and Quality Strategy (Linda Sorrels and Mary Jo Mueller)
  • Password Hashing and Cracking (Micah Hainline)
  • Plan Bee – Using The Raspberry Pi to Help Bees (Dave Guidos)
  • Risk analysis and RFC 1149 (Alison Hawke)
  • Scaling Staffing at Asynchrony (Nate McKie)
  • The Meaning of Dub Dub:  Where Apple is taking us in 2016 and beyond (Nick McConnell, Mark Sands, James Rantanen, Jon Hall, Henry Glendening)
  • UX Process (Lee Essner)
  • Who Matters and What Matters To Them (David Lowe)

Introducing: The NoEstimates Game

I’ve been play-testing a new simulation game that I developed, which I’m calling the NoEstimates Game. Thanks to my friends and colleagues at Universal Music Group, Asynchrony and the Lean-Kanban community (Kanban Leadership Retreat, FTW!), I’ve gotten it to a state in which I feel comfortable releasing it for others to play and hopefully improve.

The objective is to learn through experimentation what and how much different factors influence delivery time.

[Jan. 3, 2017 update: Game materials are now available on GitHub]


Download these materials in order to play:


If you’d like to modify the original game elements, here they are:

I’m releasing it under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, so please feel free to share and modify it, and if possible, let me know how I can improve it.


Software-Delivery Metaphors from Basketball

Basketball practice, circa 1925, courtesy Seattle Municipal Archives

Basketball practice, circa 1925, courtesy Seattle Municipal Archives. Then, as now, the WIP limit in the game is exactly one.

March has begun, and that means March Madness, when many Americans turn their attention toward basketball, with unbridled hopes in NCAA tournament brackets and the thrills of underdog upsets. Basketball offers not only the promise of nail-biting college tilts but also some helpful metaphors for software delivery. Read on for an assist for your team!

Practice Your Free Throws

I was never a star basketball player (I made it only as far as the high-school sophomore team), so I was keenly aware of my need to practice in order to build my skills. For instance, I always found that if I were having trouble making field goals (which was not infrequent!), it helped to practice free throws. With no jumping involved and no one defending me, this allowed me to simplify my form and focus on the basics. I reasoned that, if I couldn’t hit a free throw, I had no business trying longer-range shots in complex situations. Even now, I still can’t figure out why players who take most of their shots from behind the three-point line can’t seem to reliably make free throws.

The same is true in software delivery. For instance, before you can realize the goal of continuous delivery, you need to discipline yourself in automated testing and continuous integration. Be able to reliably answer in the affirmative Jez Humble’s three questions:

  • Does everyone check into mainline (at least) once per day?
  • Do you have a suite of tests to validate your changes?
  • When the build breaks, is fixing it the team’s #1 priority?

If your team aspires to continuous delivery, you can’t keep chucking up the same code or try to do it in the midst of delivery commitments and deadlines with a bolted-on “devops team.” You need to slow down in order to speed up — take time to write tests at the proper levels and integrate continuously. If your throughput is lower to begin, so be it. It’ll be higher in the long run.

Planning Flow During the Timeout

If I had scored a point for every standup with the report-to-the-leader anti-pattern that I’ve witnessed, I’d have made varsity. I understand the accountability idea behind Scrum’s three questions, but I have rarely seen it implemented in practice in a healthy way. The standup tends to be rote, individual-oriented, low-energy and low-value, with teams sometimes abandoning them for “real work.”

Contrast this with a timeout in basketball. It’s fast, full of energy and purposeful. Why? The timeout is focused on how the team can work together in the next short period of play. That’s it. Imagine if the coach went around the circle demanding that each player describe what he had been doing:

  • “Well, coach, I missed two jumpers but made a free throw.”
  • “I’ve been guarding #18. Still plan to guard him after the timeout.”
  • “I’ve been running up and down the court. No blockers.”

Anyone who has been watching the game knows these things! Likewise, we were all in the office yesterday; we know you’ve been working. In a timeout, individuals don’t report status; the team proactively solves its main impediments to flow:

  • “They’re double-teaming me, coach. That means someone is going to be free — let’s get Christopher or James the ball more.”
  • “I can’t keep up with #18 — Chike, can you drop down and help me guard him in the low post?”

In a timeout, conversation is lively and self-organizing; no one waits to be called on. When the timeout ends, the team runs back onto the court knowing the plan. Does your software-delivery team know the plan when standup ends? Treat standup (a.k.a. daily flow planning) more like a basketball timeout, and orient your standups toward the team and flow.

A Whole-Team Approach to the 7-foot Constraint

That brings us to one last metaphor from basketball: System constraints are like a defense that you have to dynamically figure out. The Theory of Constraints tells us that every system has a constraint that governs its output. In basketball, this constraint is sometimes easy to spot, whether it’s the 7-foot dude who is blocking everyone’s shots, or your point guard who keeps turning the ball over. In basketball, both on the playground and on elite NCAA courts, teams adapt to their constraints. It happens so fast in basketball that we don’t even think about it: If the 7-foot dude blocks shots from close range, a coach may deploy a lineup of better perimeter shooters or a player who is quicker and can draw fouls from the big man. Another example is a double-team situation: The team doesn’t expect a double-defended player to try to keep scoring — no, the team comes to help him, since one player is usually free. Basketball players do this almost instinctively, because they share a common goal: Score more points than the opponent.

In knowledge work, constraints are more difficult to see, and a lack of goal-orientation inhibits whole-team approach to the constraints. For example, if a person is “free,” it’s easy for a dev to pull in new work, heedless of how busy or “double-teamed” the QA is. That’s why we use WIP limits and make our constraints visible with tools like cumulative-flow diagrams. (In basketball, the WIP limit is one: It’s called the ball. When your teammate is double-teamed and you are unguarded in the open, you don’t grab another ball from the sidelines and start playing, do you?) Whereas basketball players naturally practice the art of work leveling by constantly taking a whole-team approach to constraints, we in software development can do the same. We merely need the help of simple job aids and a shared goal, which doesn’t mean staying busy as individuals, but means finishing work.


How to Facilitate a SAFe Big-Room Planning Meeting

IMAG6853I recently facilitated a two-day quarterly release-planning meeting for a customer. They were keen to use the Scaled Agile Framework-style “big-room” approach to the planning, so I attempted to support that as best I could. I’ll reserve commentary on SAFe itself (perhaps for a future post), but here I’ll describe some of the highlights and keys to the success of their meeting from my vantage point as facilitator.

Background

This customer is embarking on a mission-critical project that includes teams (business and delivery) in no fewer than six locations, spans nine hours in time zones and integrates at least four legacy systems. Many business and delivery team members had previously not even met each other in person. And most are relatively new to agile ways of planning and delivering.

The Challenge

The main deliverable in a SAFe big-room planning session is a Program Board, which is essentially a sprint-by-sprint breakout of the work for all of the teams in the Program Increment. It can depict dependencies and assumptions, but it’s basically the plan for the quarter (five two-week sprints or so). As I remarked to the group in my introduction, the plan we would create would be wrong because we’re trying to predict the future. But we would plan in such a way as to be adaptable as soon as we realized that the plan was wrong. The plan would include the What — what we’re mean to build and why we’re building it — as well as the How — the technical approaches, working agreements and roles of the people doing the work. We mainly followed the SAFe-recommended agenda, though as facilitator I did add a couple of things (which I outline below) and shaped the form of some of the SAFe stuff.

In addition to creating a plan, the group really needed to establish a shared understanding of the business context as well as of each other. With so many different people — including different native languages — it’s easy to distrust and make assumptions, so getting the group to bond as a unified team was one of my “subversive” goals.

Retrospecting

IMAG6886.jpg

Closing retrospective.

We retrospected at the end of each day. Facilitating a retrospective for nearly 50 people requires adapting the approach from that of a typical small delivery team. At the end of Day 1, I invited the group to split into mixed groups of up to eight people (following the rule of “Up to eight, collaborate”) and gave each a easel-sized sheet with categories of Stop, Start, Continue and Puzzles. Then I asked each group to have one person facilitate a discussion and decide on a couple of actions that they would like to suggest to the whole group. I then called the entire group together and had the “lay facilitators” come forward and present their ideas. We did a simple Roman vote across the room for each and amended the working agreements accordingly. As is usually the case in any retrospective, the best part was the conversation that ensued through sharing.

 

Keys to success

From my perspective, as well as from anecdotal and retrospective feedback, the following were keys to our success:

  • The right people (a.k.a. business and technical implementation people in the same room): When you’re dealing with distributed teams and
    IMAG6875

    RAID bingo.

    trying to deliver important business value, having the business people there to contextualize and simply interact with the technical implementation people is as valuable as the plan that they create. Far more important than any delivery methodology is a foundation of trust and understanding, and this group laid that foundation by having the true business stakeholders and users in the room, and not merely for a token pep talk talk at the beginning. Business people were talking about Minimal Viable Product releases and interacted with the delivery teams throughout the two days. And technical people gave a couple of product demos to the business, which yielded understanding and new ideas.

  • Strong facilitation and self-managing: The group never would have made the progress it did without disciplined commitment to its Working Agreements, which we outlined at the beginning. They were:
    • Check-in, Check-out protocol: I can’t say enough about how useful this is for a meeting of any size.
    • “Hands” rule: Since we often had lots of concurrent conversations happening (in team breakout planning, etc.), we were frequently loud. The old “hands” rule, in which one person holds up a hand and stops talking and thus creates a knock-on effect whereby everyone else follows suit allowed us to come to focus within 15 seconds of when I raised my hand.
    • Be on time: To facilitate this, I projected a giant countdown clock during breaks, and walked around holding with a “5 minutes remaining” card while people were in lively breakout groups.
    • “Yes, and…”: Borrowed from the improv world, this was a simple attitude that disposed us toward active listening and affirming what the other person was saying, so that we might build collaboratively rather than shut down conversation. Since most people were not previously familiar with the concept, we did the “Yes, and…” warmup as our icebreaker on Day 1.
  • Setting these agreements out upfront and “enforcing” them early allowed the group to self-manage and own the agreements, so that I had to really “intervene” only a handful of times. (And yes, we actually finished on time.)
  • Variety: No one likes sitting through back-to-back Powerpoints, struggling to stake awake; two full days of intense planning and thinking is difficult enough. So we mixed up the style of presentations and planning sessions using a variety of formats:
    • Games (or as Luke Hohmann would say, “collaborative frameworks”) for quick, collaborative discovery, like RAID Bingo
    • Lean Coffee for working through the Parking Lot/Car Park
    • Open Space for topics like deeper dives into the business, working agreements and tradeoff sliders
    • Warmup games for initiating conversation and creativity
  • Rearranging the space: We started with round tables of eight, but throughout each day we reconstructed the space, moving tables to create a circle of chairs, moving a couple of tables out of the room altogether and creating standing open spaces that encourages movement around the room. And of course, we had a designated “checkout table” in the back.
  • IMAG6833.jpg

    We used star stickers to indicate dependencies. These were the notecards that served as legends for reminding what color went with each team.

    Informative workspace: As is typically the case in release-planning meetings, we used loads of information radiators on the walls. This allowed us to share and persist the decisions and questions but also to create an engaged group, because we were able to all collaborate (a group of 50 physically can’t collaborate around a monitor or even a projection screen). By having the Program Board on one wall, the Story Map on another and Team Board all around, the room had no “front” but became a “theatre in the round,” which itself fostered collaboration and engagement.

  • Celebration and fun: Music, photo slideshow at the end, appreciations (gratitude board).

For next time

One bit of feedback was that we could’ve used one more day. I think that’s fair but will depend on how much face-to-face time a group has with each other (in this case, it’s warranted). And as focused and on-schedule as we were, one improvement I’d like to make for next time is to state the objective/outcome and people needed for each agenda card. Other feedback was to ensure enough space in the venue (we were cramped at times, so we removed a couple of tables on Day 2).

Summary

If you do it right, you get the stated “deliverable,” which is the plan. It might even be a pretty good plan (though all plans are guesses and therefore wrong!). But you also get something much longer-lasting and equally as valuable: the team building that fosters trust and communication, which in turn fosters collaboration and shared understanding. I can’t tell you how many times — and how deeply gratifying it was — to hear people exclaim, “Wow, I didn’t know that you did it that way in North America!” or “That really clarifies things for me!” Having the right people in the room, strong facilitation and engagement are they keys.