How Many Runs Would Man City’s Seven Goals Have Been?
Posted: October 24, 2017  Author: Pip  Filed under: software development  Tags: forecasting 
After Manchester City scored seven goals in their Oct. 14 match against Stoke City, my first reaction was: Wow, they’re playing some beautiful, unselfish soccer. Being also a baseball fan, my second reaction was: That’s a load of goals — how many runs would that equate to in baseball?
To find out, I used the same technique that we can use for understanding the performance and predictability of our knowledgework systems, such as software delivery.
First, let’s look at the distribution of goals per team in soccer. Since the new English Premier League season has only just begun, I’ll use the data from 201617, the most recent complete season of play:
From this we can then start to understand the likelihood of a sevengoal outburst by a single team. For instance, with 246 occurrences in a total of 760 total outcomes, the goal total of one is the most likely, at 32.4% Seven goals happened only once last year, making it 0.1% likely.
We can do the same for baseball. Let’s look at the runs scored per team for the entire 2017 regular season, which recently concluded:
(That 23run game was when the Washington Nationals beat the Mets by a landslide on Apr. 30.)
To compare these outliers, we could use something like an average with standard deviations away from that. But the data from both the EPL and MLB are not normally distributed, which renders that approach inappropriate. Instead, we’ll use percentiles. Why? As Dan Vacanti writes in When Will It Be Done?:
Percentiles are not skewed by outliers. One of the great disadvantages of a mean and standard deviation approach (other than the false assumption of normally distributed data) is that both of those statistics are heavily influenced by outliers.
A percentile is simply a level that contains a certain percentage of data points. For instance, if I looked at the Premier League data at, say, the 61st percentile — the “one goal” column, that would mean that 60% of our outcomes were teams who scored one goal or fewer (the total percentages for zero goals (28.2%) and one goal (32.4%). We could even draw a curve that shows those numbers:
From the Premier League data, we see that the sevengoal outcome doesn’t happen until the 100th percentile, which makes sense because it was the highestscoring outcome! We have to go all the way to the 100% percentile in terms of likelihood of possibilities to arrive at seven goals.
So where is the 100th percentile for baseball? Naturally, it will be the highestscoring run total of the season:
Now we have our answer! Seven goals, at least from recent data from the English Premier League, is equivalent to 23 runs in Major League Baseball.
Okay, so maybe that wasn’t all that interesting, since all we did was take the top outcome from each league. But using the same approach, we could develop a reference table for all of the scoring outcomes.

0% 
60% 
80% 
90% 
98% 
99% 
100% 
MLB runs 
04 
56 
78 
911 
1214 
1522 
23 
EPL goals 
0 
1 
2 
3 
4 
56 
7 
Reading the table, you can make statements like:
 In 60% of MLB and EPL games, a team scores six or fewer runs and one or fewer goals, respectively.
 Seven or eight runs (or fewer) in baseball occurs at about the frequency as two (or fewer) goals in soccer.
We can apply this same approach to our deliverytime data in software delivery, because, like these professional sports, the data is not normally distributed. In fact the distribution of both leagues probably looks a lot like your team’s (graph it and see!). In knowledge work, as in this little exercise, we’re also trying to determine the probability of a single outcome happening, as in when we ask the question: “When might I expect this user story to be finished?” We can answer that question, and then plan, using percentiles, just like we did with the sports scores, like: “We have a 90% confidence that we’ll complete any given next user story in 11 days or fewer.” And like the sports scores, the longer the range in the “tail” the farther it pushes out our highest confidence intervals.
So the next time someone asks you about the likelihood of your favorite sports team — whatever the sport — scoring a certain number, you’ll know what to do — just as you will in your own team when someone asks when to expect a single piece of work to be finished.
Special thanks to Dan Vacanti for the insights from his recent book, When Will It Be Done?
Like this:
Like Loading...
Related