Just last week there was much consternation as a Populus poll put Labour five points in the lead, while a Lord Ashcroft poll over the same fieldwork dates put them six points behind. Much of the brouhaha is down to statistical illiteracy, and a poor understanding of the nature of the sampling and weighting methods used in the polling industry, with some basic misnomers common even among relatively informed debate.
One of the typical explanations put forward for polls that are surprising or contradictory is ‘margin of error’. The idea that there is a known likelihood of the observed result for any given poll being within a few points of the actual value for the entire population – depending on the size of the sample. So it is not unreasonable to suggest that the stomach ulcer-inducing YouGov poll in Scotland, and last week’s apparent divergence in the polls, is down to simple sampling error.
But the margin of error is only applicable to random sampling error, and none of the opinion polls reported in the media today could be described as pure random probability polls. At a minimum, samples are reweighted to be representative of the demographics of the general population, while pollsters must deal with potential for systematic errors introduced by non-response (e.g. those in the population who do not answer their phones or do not own them at all). Internet pollsters, in particular, must account for their samples being self-selecting and likely more educated and wealthier than the general population.
The varying composition of samples over time matters because it can produce movements in the polls that are due to differential response rates of partisans – i.e. people being more likely to respond to surveys when their party is doing well or badly. It also matters because it has the potential to produce even wilder deviations from the underlying value than a pure random sample.
All these issues are well known to pollsters, who go to great lengths to ensure their samples are representative. Even if we were to accept that the sampling error of a poll of 1,000 respondents is around three percent, despite all the methodological adjustments that go on, this is not the end of the story.
Depending on the point in the electoral cycle, a substantial proportion of respondents (anything between 20% and 40%) may indicate they do not intend to vote or do not know who they would vote for. Some pollsters exclude respondents who are less than 5 out of 10 likely to vote come election day. This means that within any sample there are two groups: voters and non-voters.
The upward adjustment of the 'headline figure' of vote intention, excluding non-voters, may give a more accurate picture of the final result (often the Conservatives and Labour will be struggling on around 20% of the sample before the number is recalculated). But it also inflates the sampling error, such that a deviation of one percent in the raw figures could end up being amplified when a sizeable percentage of respondents are excluded from the final estimates of vote intention. It is also crucial that the usual margin of error does not apply to the difference between estimates of vote intention for parties.
There are good reasons for the methodological adjustments that pollsters make to their samples, such as weighting for likelihood to vote or for recall of past voting behaviour. This has served the UK polling industry relatively well in predicting the outcome of general elections in recent years.
Such weighting often will reduce the error variance observed in the polls, ensuring the sample looks like the voting population – and then measuring voting preferences. However that depends on the weights remaining fairly constant over time, and has the potential to introduce additional error. For example if weights, such as likelihood to vote, are not exogenous to the campaign.
This means that most of our discussion of the polls overlooks a fundamental uncertainty about how much noise we should expect in the polls from day-to-day or week-to-week. We simply do not know the distribution of sampling error. Nor do we know how this error variance might vary across pollsters, due to the methodological decisions they make. It might be that one pollster’s method makes for much more volatile numbers than another, but it is difficult to disentangle this from actual changes in underlying public opinion.
We are able to calculate systematic differences in response patterns across pollsters, but this does not tell us whether what looks like a rogue poll is simply down to sample error (and bad luck for the pollster.) Or due to a systematic error in the way that public opinion is being measured. All this makes it imperative to treat individual polls with healthy caution, and look at an aggregation of polls across polling houses, as we do with the Polling Observatory. It is also important not to rush to cast aspersions when a pollster produces a number that defies our expectations – or does not fit with that day’s conventional wisdom about the prevailing winds of politics.
Certainly the idea that political parties and media should regulate the polling industry as a response to such events is derisory. Especially given the widespread statistical illiteracy regarding the probabilistic nature of polling (which struggles to even cope with the idea of margin of error), with every small movement in the polls being over-analysed or over-exploited for political gain.
That is not even to mention the obvious conflict of interest that would be involved in politicians regulating the activities of polling companies, and passing judgement on whether the wording of surveys is appropriate. Instead of overreacting when the polls appear to misbehave, we might be better off analysing how pollsters are able to produce such stable estimates of voter preferences and how they get close to the election result so often.