Are there enough traditional statisticians – rather than data analysts – working in the tech industry? The new digital elite must recognise the vital role of statistics and quality assurance in the AI boom. Here’s why.
Artificial intelligence (AI) has broken into the mainstream, with profound implications for our society and the economy. Speculation aside, the consensus appears to be that, whether it will be the creative destruction of old and new jobs, or the overhaul of various sectors in the economy, society in 10 years’ time will look very different from now.
So, what role does statistics have to play in the new era of AI? Statistics is often perceived by the public as a somewhat obscure and archaic discipline. Maybe not as interesting as its modern counterpart “data science”. However, the field of statistics provides the formal rigour and robustness that is often absent in its more contemporary counterparts, machine learning and AI.
Global AI ethics regulations clearly demonstrate that governments perceive the “wildcard” nature of AI to be a potential threat. This is why quality control is so important, and the discipline of statistics plays a huge role in this.
For example:
- Quality control of hallucinations (i.e. random facts presented as if they are real). Statistical process control is well recognised in industry as the ideal way to distinguish signal from noise, avoid knee-jerk reactions and perceive the underlying trends – AI needs to embrace this.
- Model validation and performance metrics. Too often a model is accepted and erroneously applied under changed circumstances, and a statistician knows to look out for this.
- Statistical comparison between different large language models (LLMs). There are many different metrics and benchmarks being used to measure the performance of different LLMs, but there is no standardised way to measure performance across the spectrum. Big enterprises report small gains in performance as a major achievement against competitors, without using any standardised hypothesis tests.
- Longitudinal studies on the impact of AI. There are a lot of discussions on how AI is going to affect a range of issues, from unemployment to inequality, but no real studies on any of these.
- Bias recognition. Similarly, there are no standardised processes to measure and understand bias in LLMs. This is where statistical science can provide a standardised methodology for measuring and mitigating bias.
It is essential that the statistical community embraces the challenge of AI and argues passionately for the inclusion of statisticians in all aspects of the new reality
As we embrace the AI revolution, we must also acknowledge the challenges it brings. From ethical dilemmas to the risk of job displacement, the road ahead is fraught with complexities. Yet, with the rigorous application of statistical methods and a steadfast commitment to quality assurance, we can navigate these challenges and harness AI’s full potential.
It is essential that the statistical community embraces the challenge of AI and argues passionately for the inclusion of statisticians in all aspects of the new reality. We must promote our case by demonstrating the vital role that statistical thinking plays in all successful ventures. How many times have groundbreaking results nearly been missed through a lack of clear overview, stepping back from the minutiae and perceiving the underlying trends or the effect of random variation; sifting through the effects on a measure and perceiving the key driving factors that influence the outcomes?
As AI becomes increasingly integrated into our daily lives and more accessible to a wider audience, the discourse on its trustworthiness will intensify. This is compounded by AI’s evolving role in analytics, positioning it as an essential tool for analysts. However, the burgeoning reliance on AI necessitates a critical approach. Statisticians and data professionals should advocate for rigorous statistical methodologies rather than merely accepting AI-generated insights. Although AI represents one of the most significant advances in productivity tools, the decisive factor should always be the combination of domain knowledge and technical expertise, ensuring that human judgment remains paramount.
Statistics is often counter-intuitive – it may seem obvious to throw more data at a problem, but careful measurement of a representative sample is more productive than doubling the data and hoping for the best. How often can the key to a problem be attributed to poor measurement or poorly defined definitions? The experienced statistician knows to look for these “easy wins”.
The deep, sensitive thoughts of a statistician play a fundamental role in the AI team and must be included at all costs. The statistical community must be forthright in publicising achievements showcasing its value. Blowing our own trumpet should be an absolute requirement of all the work we undertake.
Stylianos Kampakis is a data scientist, chartered statistician, AI engineer and blockchain and tokenomics expert.
Shirley Coleman is a chartered statistician and winner of the RSS 2024 Greenfield Industrial medal.
You might also like:
What is “best practice” when working with AI in the real world?
UK government sets out 10 principles for use of generative AI