A statistical risk-scoring system is used by UK doctors when considering whether to prescribe certain medicines such as statins. But is this an example of ‘ecological fallacy’?
Preventative medicine is a good thing, right? Yes – but how do we best assess risk before disease occurs? This is a crucial question for both policy makers and clinicians, to make sure medical intervention is properly targeted. Both under-prescription and over-prescription can involve health risks to patients, as well as financial burdens on health providers.
So would you be surprised to find out that you’d been prescribed medication on the basis of your postcode? Thanks to the use of computer-generated risk scores in prescribing, this is what can happen.
Family doctors (known as GPs in the UK) and other clinicians are increasingly using prescribing tools that input personal data into a model to provide a risk score. This is then used – along with an agreed risk threshold – to determine whether or not the patient should be treated with a medication. For example, the UK government’s National Institute for Health and Care Excellence (NICE) recommends using a risk model known as QRISK for prescribing statins that has been in use in the NHS since 2009, and which is periodically updated to reflect new clinical findings. If a patient has a calculated risk score on the QRISK model of more than 10% for having a heart attack over the next 10 years, GPs are advised to prescribe statin medication to reduce the patient’s cholesterol levels, and thus reduce the risk of a heart attack.
When this risk score used for prescribing, factors that may be linked to risk at a population level may not be valid at the individual level – leading to potential errors
So far, so reasonable. But a problem arises from the fact that the model uses aggregated data from across a whole population to provide a single number representing a risk score, based on a limited number of elements of personal data. When this risk score used for prescribing, factors that may be linked to risk at a population level may not be valid at the individual level – leading to potential errors.
Take the example of a 67-year-old woman living in west London. In a routine check-up, her GP practice inputs a few items of her personal data – height, weight, age, blood pressure, smoker/non-smoker, plus her postcode and some details about her diet – into the current version of the QRISK model, QRISK3. The model also pulls in some clinical data from the patient’s records, and then provides a risk score output: in this case, 12.1%. The practice routinely prescribes statins if a patient’s QRISK score is higher than 10% – so the patient is prescribed statins. But when the patient asks the clinicians about her cholesterol levels, she is told her cholesterol levels are in the normal range – nevertheless, the GP practice still wants to prescribe statins. Why?
Perhaps surprisingly, the postcode element is significantly influential in QRISK3. For example, the same personal and clinical data that the clinician entered into the model could, with a different postcode, produce a score below 10%: no statins needed for this reduction. But this is the same patient, with the same clinical, personal and lifestyle data – so are statins appropriate for her, or not?
Uncertainty for individuals
A 2019 study1 by researchers at the University of Manchester suggested that the QRISK model used by GP practices to predict a patient’s risk of developing cardiovascular disease (CVD) could be producing misleading results for some individuals, giving inaccurate risk scores in certain cases, despite the validity of the population data on which they are based. As their research paper states, “Risk prediction models based on routinely collected health data perform well for populations but with great uncertainty for individuals.” According to the study’s lead researcher, Professor Tjeerd van Staa, using models such QRISK3 “may mean that a patient may have a much lower risk than predicted by QRISK3 (and may not require the statin), or may have a much higher risk than predicted (and not be getting treated with a statin).”
So how exactly does this arise? Van Staa’s study identifies three main sources of potential variability that reduce the accuracy of the risk score. First, GP practices across the UK use different ways of recording data from medical records, due to using different computer systems and clinical coding. According to van Staa, “A patient with a predicted risk of 10% of developing CVD in the next 10 years could have a risk between 7.2% and 13.7%, depending on which practice they came from.”
Tempting though it is to infer a risk factor directly from the population data alone, the individual data analysis showed a reverse correlation from that indicated by the population study
Another issue is that the inputs to the QRISK model do not capture all the factors that can affect cardiovascular risk. The QRISK score a patient obtains is therefore effectively averaged across the population for the unrepresented factors, which means the patient can obtain a risk score which is significantly adrift from their actual risk.
Third, there is the decision to use postcode data2. The introduction of postcode to such models is intended to incorporate a measure of deprivation (based on the Townsend Deprivation Score3, which assesses socioeconomic-related factors such as unemployment and car and/or home ownership) to address the known issue of under-prescribing of preventive medicines in more deprived populations. Including postcodes increases the QRISK score for people in areas with above-average deprivation scores, thus helping to reduce prescribing inequality across populations. However, at an individual level this can provide a misleading QRISK score, because within any area there is a spread of factors linked to deprivation as well as to CVD risk. It’s also important to remember that postcodes can’t actually cause heart disease, even if they represent a statistical association. In the case of the patient in London, it turned out that her postcode was associated with quite a high deprivation score, which may be due to the relatively high proportion of people in London living in rented accommodation, coupled with lower rates of car ownership relative to less urban areas. In effect, this patient was being prescribed statins based partly on the statistical anomalies of city living, rather than on her own CVD risk.
Ecological studies vs. individual-level analysis
In statistics, when conclusions that are valid at the population are erroneously applied directly to individuals, this is understood as an error in statistical reasoning known as the ecological fallacy. This fallacy can occur not only in modern risk-based models: one very striking 19th-century example is described in a 2021 article titled ‘Be careful with ecological associations’4. In an ecological (population-based) study carried out in Prussia in the late 19th century, it was noticed that districts with a higher proportion of Protestants (compared to other districts with more Catholics) had higher rates of suicide. Being a Protestant therefore appeared to be a risk factor for suicide. However, a closer look at the data on the individual level showed that the majority of suicides were not in fact among Protestants, but among Catholics: Catholics in a predominantly Protestant district were more at risk of suicide. It’s not hard to see why: the actual risk factor for suicide was not the religious affiliation itself, but the fact of being in a religious minority, leading to social isolation. This example neatly demonstrates the ecological fallacy: tempting though it is to infer a risk factor directly from the population data alone, the individual data analysis showed a reverse correlation from that indicated by the population study – and revealed the causal mechanism.
There must always be room for discussion in the GP’s consulting room (or online space) to explore the true causal risk factors, beyond the numerical output of the model
The authors of this paper cite a further example from 1995–6 of an ecological (population-based) study looking at whether rates of resistance to a specific antibiotic (trimethoprim) were correlated with rates of prescribing this antibiotic at different GP practices. The study found no correlation, so initially antibiotic resistance appeared to be unrelated to prescribing practice at the population level. However, when this study was followed up one using individual patient records, trimethoprim resistance in an individual was indeed found to be correlated with that person’s past exposure to this antibiotic – a causal relationship that was hidden at the population level. The authors conclude that, to properly understand causal factors (as required for clinical prescribing for example), an individual-level analysis should be considered to supplement and follow up any apparent correlations (or lack of) suggested by population-based studies.
So what does this mean for current GP practices basing their prescribing on population-based models such as QRISK? And what does it mean for us individually as patients? As van Staa’s study concludes, “Risk prediction models based on routinely collected health data perform well for populations but with great uncertainty for individuals. Clinicians and patients need to understand this uncertainty.” Thus, there must always be room for discussion in the GP’s consulting room (or online space) to explore the true causal risk factors, beyond the numerical output of the model. Including postcodes in the input data has the important function of bringing in more people from areas that have a higher deprivation index; but the individual patient may not fit the pattern built into the model.
Risk models and the future
Around the world, risk models and data-driven machine learning algorithms are playing a significant role in medical prescribing. As the QRISK example shows, clinicians would be wise to reflect on how risk scores are arrived at, even when using a risk threshold recommended by national policy. And it’s surely likely that, as clinicians gradually encounter more AI tools based on data patterns and algorithms, while also responding to the need for more preventative prescribing within primary care, the need for this reflective awareness will only increase. As it becomes harder to interrogate the assumptions built into models, a willingness to follow up the outputs with further investigation where appropriate, will become more important for us all.
References
- Li, Y., Sperrin, M., Belmonte, M., Ashcroft, D.M., Pate, A., van Staa, T.P. (2019) Do population-level risk prediction models that use routinely collected health data reliably predict individual risks?. Sci Rep 9, 11222. https://doi.org/10.1038/s41598-019-47712-5
- Hippisley-Cox, J., et al (2017) Development and validation of QRISK3 risk prediction algorithms to estimate future risk of cardiovascular disease: prospective cohort study. British Medical Journal, 357:j2099. https://doi.org/10.1136/bmj.j2099
- Yousaf, S., Bonsall, A. (2017) UK Townsend Deprivation Scores from 2011 census data. UK Data Service. https://tinyurl.com/428y662e
- Roumeliotis, S., et al (2021) Be careful with ecological associations. Nephrology, 26, 501–505. https://onlinelibrary.wiley.com/doi/pdf/10.1111/nep.13861
Susan Watt is a science writer based in London, UK.
You might also like: Hospital league tables: a statistician’s view
Testing times for trials in developing countries
Photo: Markus Spiske/Unsplash