On 24 October the BBC reported that Wenger had told delegates at Arsenal FC's annual general meeting: "After nine games we have 20 points, which means the championship will be decided between 82 to 86 points." This seems a remarkably precise prediction. How did Wenger come up with it? Did he draw on deep insights into the workings of football competitions? Or did a team of analysts at Arsenal FC crunch EPL statistics to determine how many points Arsenal need to win this year?
My guess is that Wenger did something much simpler. The Arsenal manager may have looked at the EPL standings after nine weeks (Table 1) and calculated as follows: The leaders have 20 points after 9 games. That's 2.22 points per game. Multiplying this by the total number of games in a season (38) gives 84.44 points. Round down and add a little margin of error, and you have an estimate: 82 to 86 points for the winner at the end of the season. It's the sort of back-of-the-envelope calculation that would come natural to a man with an economics degree to his name.
|Table 1. Premier League Table after 9 matches of the 2016/17 season|
Claudio Ranieri, manager of last year's surprise champions Leicester City, was asked to comment on Wenger's estimate. With characteristic humility, he appeared to disagree: “This season there are so many teams fighting for the title, no? […] If Arsène says 86, he's right. I believe a little less, but he's the boss.”
As Table 1 shows, after nine weeks of this season, the five leading teams were separated only by a single point. Ranieri's idea that the winning points tally should be lower in a season where there are several title contenders makes sense: these contenders would all tend to take points from each other, thus lowering the final tally of the winning team, relative to a season where a single team dominates and mostly beats everyone else.
So how many points will it take to win the English Premier League title this year? Can this be predicted at all after the first nine weeks of the season? We can use statistics to test the hypotheses put forward by two giants of English football, and to come up with our own estimates. From a statistical point of view, we may ask three questions:
- Can a statistical model predict the points tally of the eventual EPL winners from the leader's number of points after nine matches, and does this prediction agree with Wenger's estimate?
- Can we improve on the above prediction if we also take account of the points of other title contestants after week 9, as Ranieri suggests we do?
- Are the points of the leading teams all that matters, or are there other features of the nine-week table that can contribute to a prediction of the winning tally?
In order to answer these questions, I considered the 21 previous EPL seasons that were contested by 20 teams.1 These are the seasons from 1995/96 to 2015/16. This period almost coincides with Wenger's reign as manager of Arsenal. Data were taken from the official website of the English Premier League (www.premierleague.com).2
1. The Wenger hypothesis: Predicting the winning tally from leader's points after nine weeks
Figure 1 shows the relationship between the points tally of the EPL leaders after nine matches (let's call them the “9w leaders”) and the number of points gained by the winning team at the end of the season (“winning tally”). Note that the 9w leaders and the eventual winners may or may not be the same team.
There is a clear correlation: the more points the 9w leaders had, the higher the title winning tally. Looking at the past 21 years, only three winning tallies fall into the range of 82-86 points predicted by Wenger: these were the 95/96, 02/03, and 09/10 seasons, respectively, and were won by Chelsea in 09/10 and by Manchester United in the other two cases. After nine matches, the leaders in these three seasons (Newcastle, Arsenal, and Manchester United, respectively) had accumulated between 22 and 24 points, so more than 9w leaders Manchester City had this year. Interestingly, there has not been a previous year where the 9w leaders had exactly 20 points.
Figure 1. Relationship between leading team's points after nine weeks, and EPL winner's final points tally
A more sophisticated way of using the information is to develop a statistical model predicting the final winning tally from the 9w leader's tally in the same year using all available years. I used simple linear regression to do this. This yields a predicted winning tally of 82.4 – just within the range of Wenger's estimate, albeit at the lower end. However, the model is less confident than Wenger about the precision of this prediction: an 80% prediction interval for this model suggests that EPL will be won with somewhere between 76.3 and 88.6 points. This is a pretty wide interval. The statistical model is being cautious, partly because with just 21 seasons to go on, it is not yet very confident of its own accuracy, and partly because there is only a moderate relationship between the points of the 9w leaders and the winning tally, leaving much room for error.
2. The Ranieri hunch: Predicting the winning tally from several title contenders' points
Can we improve on our prediction by listening to Ranieri? To test this, I estimated two further models, taking into account the points tallies of the top three and the top five teams, respectively, after nine weeks. The results do not agree well with Ranieri's hunch: the “Top Three” model changes the prediction upward, not downward: the predicted winner's tally is 84.1 points, with an 80% prediction interval from 77.8 to 90.3. The “Top Five” model, in turn, predicts a winning tally of 82.6 points, with an 80% prediction interval between 75.6 and 89.5. Importantly, there is no evidence that using the data from teams placed second to fifth improves the prediction in addition to using the 9w leader's points (the relevant effects were not statistically significant). So these two models have not made us any wiser.
3. Mining the table: Predicting the winning tally from the whole table after nine weeks
But maybe it is not only the points won by the top teams that matter? After all, the points the leaders amass might also be related to how strong the teams in the lower half of the table are. This may determine how many 'easy' wins the top teams can expect to pick up. I engaged in some data mining and looked at the correlation between the points tally associated with all 20 positions in the EPL after nine weeks, and the final winning tally. The result, shown in Figure 2, was rather surprising.
Figure 2. Correlations between points of teams in positions 1–20 after nine weeks and EPL winner's final points tally
The points gathered by teams in positions 1, 2 and 3 after nine weeks are positively correlated with the winner's tally: the higher the points of the three top teams in week 9, the higher eventual winning tally. There is little evidence of any relationship between points tallies of positions 4 to 13 after nine weeks and the winning tally (with the exception of a modest positive correlation associated with position 11). However, there are strong negative correlations between the winning tally and each of positions 14, 15, 16, and 17 (and also 19). This suggests that the fewer points teams in those positions have after nine matches, the more points are required to win the title. Some of these correlations are stronger (albeit in the negative direction) than the association between the 9w leader's points and the winning tally.
In Figure 3, we have summarized the points gained by the teams in positions 14-17 after week nine by taking the simple average of these four numbers. This average is strongly negatively associated with the winning tally. Since we did not expect to find this correlation in advance, we should be cautious about assuming that there is a definite connection. The association may well be statistical noise and useless for prediction. Only the future will tell!
Figure 3. Relationship between the average number of points gathered by teams in positions 14-17 after nine weeks, and EPL winner's final points tally
The average number of points gathered by the teams in those positions after week nine this year is 9. This is relatively high, as EPL seasons have gone so far: only in six years was this average higher, and once, in 2013/14, it was the same.
Our next model, then, tries to predict the winner's tally based on only the average points tally of the teams in positions 14 to 17 after nine weeks. This predicts a final winner's tally of 83.4, with an 80% prediction interval from 78.2 to 88.7. So, within a point or two, this model roughly agrees with our previous models.
Finally, I investigated whether combining information from the winning points tally and the average points of positions 14 to 17 would change our guess. The answer is: not much! This model predicts the winner's tally to be 82.9 points, with an 80% prediction interval of 77.4 to 88.4.
Figure 4. Estimates of the number of points needed to win the EPL in 2016/17: Wenger's guess, previous season's results, and predictions from five models
Conclusion: Don't relax after 86 points!
Figure 4 gives an overview of all models we have discussed, alongside Wenger's guess and the range of winning tallies from the previous 21 EPL seasons. In contrast to Wenger's confident claim that the title will be won with between 82 and 86 points, statistical analysis suggests more caution. A predicted winner's tally of 87 is within the range of statistical confidence according to all five models. Thus clubs with title aspirations are advised to aim higher than 86. Although Ranieri's Leicester City won the title with 81 points last season, there is no guarantee that this will suffice this time round.
I am grateful to Mario Cortina Borja for his helpful comments on a draft of this article.
- Peter Martin is a research fellow at City, University of London
- I excluded the first three EPL seasons, which featured 22 teams. ^
- A practical issue with these data is that in some seasons not all teams have played the same number of games after nine match days. Some teams play their 10th game before others play their 9th. I used the Premier League table at the point in time when the majority of clubs had played nine games. If some teams had played fewer than nine, their points tally was adjusted using the following formula: adjusted.points = 9 × points / no.of.games.played. League positions were recalculated using the adjusted number of points where appropriate. In no case did this change which club was in first position after nine matches. However, in 2007/08 Arsenal were leading the table with 22 points after nine match days, despite having played only eight games up to that point. Their points tally was adjusted to 24.75 according to the formula defined above. ^