England fans might have been disappointed to see their football team crash out of the Euro 2016 tournament earlier this week, but they never stood much chance of winning. According to the statisticians at the Norwegian Computing Center (NCC), the probability of England lifting the trophy never got above 5%. Spain is perhaps the biggest surprise: at certain points in the tournament, the probability of the team finishing first was 20% or more.

Over at the NCC’s website, we now see that Belgium is the current favourite, at 27.7%. This probability is based on simulations of games played thousands of times. Curious as to how this is done, we emailed chief research scientist and statistician Magne Aldrin to find out. Aldrin is pictured below, second from right, with his colleagues (from left) assistant research director Anders Løland, senior research scientist Ragnar Bang Huseby, and research scientist Nikolai Sellereite.

Let’s start with the basics: How do you simulate a football match that hasn’t yet been played? How does the computer determine which teams are likely to score in order to arrive at a predicted result?
Magne Aldrin:
A football match has both a systematic element and a random component. If one team is better than the other it will have higher chance of winning, but – as the saying goes – “the ball is round”, so sometimes the best team will lose. The systematic part is taken care of by assigning a strength parameter to each team while the random part is accounted for by assuming that the number of goals scored by each of the two teams may vary from one match to another; we assume that the number of goals scored by each of the teams is Poisson distributed with expectations that depend on the strengths of the two teams.

So, the first thing we have to do is quantify the strengths of each team. If we were looking at the final weeks of the Premier League, for instance, we can estimate the strength parameter based on the outcomes of all the matches played so far. However, a tournament like the ongoing Euro 2016 is a special case. France, as host nation, has a home advantage, and that should be included in its strength parameter. Another aspect is that some teams are known to perform well in tournaments. Italy is one such team, even if it currently has an unusually low FIFA ranking. We therefore use football experts to help us to quantify the team strengths before tournaments.

However, this creates a new problem: these people are experts on football, not on probabilities! To overcome this, we give each expert a set of hypothetical matches between pairs of teams, and ask them to guess three reasonable results in each match. Before the tournament starts, we estimate the team strengths based on these hypothetical results, but these are later updated based on the real match results as the tournament moves forward.

How do you arrive at your final predictions for each team?
MA:
Each time we simulate the whole tournament, we get a winner, the finalists, the teams that reach the semi-finals, etc. When we have simulated the whole tournament many times, we simply count the percentage of the simulations that a certain team has won, reached the final, reached the semi-finals etc. This is an easy way to calculate the probabilities.

How frequently are the predictions updated, and how do the results of actual games affect the revised predictions?
MA:
The predictions are updated after the last match every night. First, the strength parameters are re-estimated, taking into account the results in all matches that have been played so far. Second, the rest of the tournament is simulated, or “played”, 50,000 times. The revised predictions thus take into account the revised strength parameters, but also the new match-ups between teams.

You’ve been predicting tournament outcomes since 1998: how accurate have your predictions been?
MA:
These are probabilistic predictions, so accuracy has two aspects. First, the probabilities should be well calibrated. The probabilities for the most probable outcome (i.e. win-draw-lose at the group stage) in each match vary from 34% to about 80%, and are on average around 50%. Thus, if we always bet on the favourite, we should make the correct bet 50% of the time in the long run. In that sense, both too few and too many correct bets indicates that the model is wrong. By comparing probability predictions with match results we have confirmed that the system is well calibrated.

The other aspect to accuracy is if we make better predictions than others. During each championship, I participate in several informal private or public betting competitions with 10-500 participants where I bet slavishly according to the model. In these instances, I am always among the top 25% of participants, often in the top 10%, and sometimes I win, which means that these predictions are clearly better than the average participant’s. This is reasonable, since we use a synthesis of the opinions of several experts (and also match data when the tournament has started).

However, I must admit that in the European Championship in Portugal in 2004 our initial probability on Greece winning was 0.0%, but they went on to win it. The 0.0% prediction was a result of our experts having no belief in Greece, but also because we rounded down the small probability to 0.0%. But we have learnt from this. Since then, we always report at least 0.1% if the real probability is non-zero. We have also improved the modelling by introducing a so called “shrinkage factor”, where the strength parameters are shrunk towards each other, i.e. the teams are forced to have more equal strengths than the experts indicate.

You said in our initial email correspondence that these predictions were “just for fun”, but what are some of the serious applications for prediction methods such as these?
MA:
Well, we make these predictions to demonstrate that statistical modelling can be used for something fun and to hopefully generate some interest in statistics and models among those who don’t normally pay attention to such things. Of course, for betting companies this is serious work, and my colleagues and I make predictions all the time: the number of train passengers on every train next month, the electricity price tomorrow, or global temperatures 15 years ahead.

Significance Magazine