Statisticians can apply their trade anywhere it’s needed, bringing benefits to friends, family and the wider world. Find out how one statistician and father used his training to help improve therapy for autistic children like his daughter
As a parent, few things hit harder than being told that your child might never be able to function independently in society. For me and my wife, this happened back in 2022 when our youngest, then two years old, was diagnosed with level 2 autism.
Being both scientists (she’s a chemist and I’m a statistician), we immediately read as much as we could about the diagnosis so we can make informed decisions on how to go about supporting our child to possibly overcome this ordeal given time. In addition to physical, occupational, and speech therapy, we read about a promising, research-backed approach specifically designed for helping children with autism and other developmental conditions called applied behavioral analysis (ABA) therapy. I was fascinated by ABA, specifically the simple yet highly intuitive idea of deconstructing complex competencies into singular components and designing activities to practice and test each component.
For example, if the goal is to teach a child how to wash their hands, the components can be 1) turning on the faucet, 2) running the water through their hands, 3) turning the faucet off, and 4) wiping their hands dry. Unlike neurotypical children, who may be able to see the connections in these steps easily and thus go through the entire lesson quickly, same-age children who have level 2 or 3 autism are less likely to do so and must be trained step-by-step1. Thus, the therapist needs to have the child practise each step repeatedly and then proceed to the next step once they are confident that the child has learned the previous step. When this was explained to me, I immediately asked the therapist how they would make that decision and, sure enough, that part is quite systematic as well. They told me that they tested the task periodically, making sure that there was a ‘washout period’ between tests, and that the child is considered to have learned sufficiently if they are able to complete the task at a set performance criterion, such as being able to do so in 4 out of 5 testing trials. I was very impressed and, by this point, was confident that the therapist would have a likewise rigorous response to the natural next question: how do you decide what performance criterion to use?
Bayesian probability
We went home from the ABA orientation satisfied that our daughter is in good hands doing her ABA sessions at the therapy centre, but also mildly curious about the extent to which what the therapist had said about there being no standard practice for how to set the performance criterion was true. In the next week, I read up on the topic as it seemed unlikely that nothing had been done about this. True enough, while there was consensus that higher criteria led to better observed performance (e.g. a child still being able to wash their hands when asked to do so two months after they mastered it in ABA therapy), there was not much information on developing a common way of selecting the criteria2. The literature showed that practitioners were on the right track on aspects of this selection, such as how a higher number of testing attempts at the same criterion percentage seemed to lead to better outcomes3, but no consistent quantitative rationale was being offered as to why. There was no theoretically grounded and collectively accepted way to assess performance criteria choices.
So, I proposed one.
Based on reading literature and speaking with our ABA therapist, it became apparent to me that practitioners were essentially interested in the probability that the child can perform a task when next prompted given that they were able to do so X times in the last N independent trials. This is well-represented by a random variable p with a beta-binomial posterior distribution. Yes, the first posterior distribution that one is likely to learn in an Introduction to Bayesian Statistics course. Starting typically with an uninformative beta prior such as standard uniform or Jeffreys prior, we take the observed data of X successes out of N trials and essentially adjust the parameters to what would become the posterior distribution. The simplicity is a good fit because, ethically, practitioners must tailor ABA therapy activities to each child and will not assume that a child can be counted on to perform better than another based on some confounding factor, such as race or age. Thus, truly, all of the data that they have to work with to make decisions is the evidence meticulously collected from the trials that they prepare. The implicit prevailing assumption prior to my work is that if a child correctly completes 8 out of 10 trials, their probability of success on another trial is 80%. More importantly, this assumption does not necessarily change in practice, even when the number of trials is reduced. For instance, a child who completes 4 out of 5 trials correctly may also be considered to have achieved 80% mastery. My paper4 argues that this reasoning is not exactly valid and that there is a method to estimate the uncertainty in a child’s true mastery level based on their achievement of the performance criteria. Taking this uncertainty into account in relation to what researchers observed in practice helped explain other findings from their experimental studies, such as the variability in children’s skill retention when reassessed after an extended period. Previously, this variability, which was typically a reduction in the child’s mastery level on follow-up, was attributed entirely to skill loss. I pointed out that it is a combination of skill loss and unaccounted variability in skill acquisition and demonstration.
This anecdote demonstrates a simple reality: as statisticians, potential applications of our trade can be as close to us as our own backyard or our neighbour’s house. Like the plumber who goes to dinner at a friend’s and notices a faint but distinctive brown stain on the ceiling, a telltale sign of a slow leak that is unlikely to catch a homeowner’s attention before it’s too late, we see things others might not because we are trained to see them. One of my former bosses, the biostatistics branch chief at the National Cancer Institute, is fond of saying that “as statisticians, we get to play in other people’s sandboxes.” To this I add: “and the sandboxes are all around us”. It’s not just that collaboration you’re doing with the biology department as part of your dissertation, not just the projects you are involved in for your post-doc, or the grants you are signed onto as junior research faculty. There are many, often simpler opportunities that you may encounter which can benefit from your unique insight; don’t be afraid to explore them when you see them. The mathematics of it does not need to be complex for it to be useful. There’s nothing complex about preparing a beta-binomial posterior for ABA therapists to gain better insight into their students’ true mastery levels relative to set performance criteria. Look, it’s simple enough that even a frequentist like me could do it!
To be clear, this is not an invitation to see everything as a nail to your hammer. Yes, it does not need to be complex, but it needs to make sense. Some problems that trigger your statistical spider sense may have simple solutions, while others may require more complex steps to solve, and yes, funding! Instead, this is an invitation to be ever-cognisant of the world around you, and of how you as a statistician can contribute to its betterment. Even within the scientific circles where you live and work, your knowledge is invaluable to your colleagues in more ways than just doing data analysis. For example, while everyone who does quantitative research can be expected to have at least some safe understanding of what a p-value is, your technical expertise puts you in a critical position to know exactly what a p-value is and, more importantly, what it should and should not be used for. Many criticisms of data analysis levelled against modern-day research, such as p-hacking or double-dipping, are not necessarily intentional or malicious; they can be honest mistakes by a research team that simply did not know better. You are there so that they will know better. And yes, of course I circled back to good old, classical Neyman fundamentals after starting this section with a Bayesian flavor. What can I say? I’m a frequentist.
Why we do what we do
In the previous sections, I wrote about how a significant challenge in my family’s life, that we continue to deal with today, led me to propose a simple solution to an existing problem in ABA practice. I would likely never have encountered this issue or proposed a solution had we not needed ABA services for our child to begin with. This taught me another simple reality: you can’t solve everything. We can’t do everything, everywhere, all at once; we pick and choose depending on what we have, where we are, and what life throws at us. That is fine. We do what we do ultimately because that is what matters to us. Whether you love genomics or doing clinical experiments, or causal inference in observational settings, we mainly affect the environments that mainly affect us. What should be common, regardless of what you love, is your love for finding truth in data. We do what we can to advance this ideal and the rest, like unknown confounding variables, are beyond our control. It is my hope that the paper I wrote on informing performance criterion choices for ABA, now published4, will help practitioners help my child see a future where she can function as an independent adult. At the very least, I made sure to name the method after her because… why not?
References
1. Hume, K. (2008). Transition Time: Helping Individuals on the Autism Spectrum Move Successfully from One Activity to Another: Articles: Indiana Resource Center for Autism: Indiana University Bloomington. Indiana Resource Center for Autism. tinyurl.com/mrwj8kbk
2. Wong, K. K., Fienup, D. M., Richling, S. M., Keen, A., & Mackay, K. (2022). Systematic review of acquisition mastery criteria and statistical analysis of associations with response maintenance and generalization. Behavioral Interventions, 37(4), 993–1012. doi.org/10.1002/bin.1885
3. Ferraioli, S., Hughes, C., & Smith, T. (2005). A Model for Problem Solving in Discrete Trial Training for Children with Autism. Journal of Early and Intensive Behavior Intervention, 2(4), 224–246. doi.org/10.1037/h0100316
4. Ramos, M. L. F. (2025). MIEBL: Measurement of Individualized, Evidence-Based Learning Criteria Designed for Discrete Trial Training. Behavior Analysis in Practice. doi.org/10.1007/s40617-025-01058-9
Mark Louie F. Ramos is assistant professor in the Health Policy and Administration Department at Pennsylvania State University, USA.
You might also like: Magic: The Gathering – Are you cheating?
Honey, I lost the kids: The undercount of young children in the US Decennial Census
