Elementary, my dear Watson: a story of reasonable doubt

An embarrassing personal experience enforced on me the lesson that research is a human enterprise that can be clouded by bias, even when conducted by a competent party. Face validity and reasonable doubt are important tenets of research, serving as the reality-check and litmus test for a phenomenon under examination. Reasonable doubt serves to qualify the status quo; it presents the default setting and informs the presumed condition of the null hypothesis.

Face validity concerns whether, 'particular empirical measures may or may not jibe with our common agreements and our individual mental images concerning a particular concept'¹. The American legal system’s premise of 'innocent until proven guilty' lends a familiar example, including the related notions of the 'burden of proof' and 'reasonable person'. Thus we can frame the null hypothesis like the presumption of innocence and position the research hypothesis to require the burden of proof.

I bought some high fidelity earbuds for my iPod Shuffle last year and use them several times a week. However, the more I wore them, the more something bothered me. It was not an auditory foible or a performance issue per se, but that in the process of installing them in my ears I experienced an improbable recurrence. It was so unlikely to me – being a teacher of statistics – that I became irrationally superstitious about it and did not report the phenomenon to anybody for fear of breaking the ridiculous unbroken streak of occurrences. Every time I unraveled the cord with my left hand and then used my right hand to select an earbud to put in first, it was always the one marked with an 'R'. Finally, a few days ago I went to put them on and was expecting to be flummoxed yet again at the consistency of picking the 'R' ear bud when I just happened to hold both of the earbuds in a certain way. For the first time, I regarded them from an omniscient perspective and was awoken. They are marked the same: both have an 'R' on them despite being shaped differently for the ergonomics of each ear.

After my hysterics and humiliation subsided, it was time for reflection. I knew perfectly well that as the number of trials grew, the probability of picking the 'R' earphone would remain 50% if the earphones were marked 'L' and 'R' and there were no other variables that would consistently bias the one that I grabbed. Yet I had a 100% 'R' sample proportion after dozens of trials. I am ambidextrous and the way I uncoiled and recoiled the earbuds would not have an effect, especially as the darned things always got tangled in their storage pouch. The elementary lesson of coin-flipping probability imparts that no matter how many times in a row the coin lands heads up, the cosmic accountants do not owe me a 'tails' flip any more than I was due to select the 'L' earbud the next time.

There were two confounding elements to this situation that should have given any probabilistic statistician pause for thought. First, there were times that the earbuds were uncomfortable as I wore them. Second, some songs that I knew fairly well from past listening sounded indeterminately different. These elements harbored difficulties in investigation that encouraged improbable and empirically lazy explanations. For instance, I dismissed the occasional physical discomfort of the earbuds as an inconsistency in my own physicality; why shouldn’t the furls of my ear cartilage change day to day? I use my device only in shuffle mode and the order of tracks is random. Scanning back to analyze a familiar song that did not sound right was technologically difficult and often impractical in my normal usage context (e.g. riding my bike on a rough trail). Even if I had pinned the stereo mixing as suspect, I was unable to validate it because of inconsistencies in orientation to my home speakers and therefore not having reliable previous empirical observation. I do not have enough of an aurally eidetic memory to catch when, say, the bass should thump on the left and the horns should blast on the right.

In this case, reasonable doubt failed to overcome the awe of my empirical observations and make the transgressing of face validity obvious. Good research should always keep reasonable doubt at the fore; it is part of the regular and necessary consideration of face validity in any experiment. Theories attribute stock market performance to lunar cycles, athletes repeatedly wear filthy 'lucky socks' to maintain a winning streak, and gamblers perform elaborate rituals when throwing dice in order to increase their chances of success. The irony is that assessments of face validity and employing reasonable doubt – being qualities of a human being’s innate rational faculty that are unexplainably simple – are often suspended when our preferences are better served by the completely illogical explanation of luck. This could be construed as a nice testament to the power of hope and expectations that underdogs can triumph in the face of insurmountable odds, but that is an injustice to our more useful faith in the explanatory powers of logic.

The mystery between my earbuds was not a statistical anomaly; it was merely a production mistake. A quick shot of reasonable doubt and test of face validity could have illuminated that the problem in my analysis was in my own measurement of the data. I assumed that the indicator of an earbud’s orientation would be completely encompassed in its marking as 'L' or 'R' and that each one of the pair would have a unique marking. Statistics textbooks do not take special pains to discuss the chance that the coin in the probability module is correctly minted with unique sides. In fact, the predominant theme when it comes to worries over bias in research is the common adage that, 'statistics don’t lie, but liars use statistics.' Who could imagine that my little earbuds were dishonest?

The grand lesson of this story is that reasonable doubt and face validity are essential parts of statistical analyses. They serve to incorporate a uniquely fantastic and effective human capacity for detecting improbability that is more nuanced than any mathematical filtering algorithm. Reasonable doubt and face validity as elements of investigative practice are supported throughout the history of knowledge. Great minds like Thomas Aquinas, William of Ockham (Occam’s razor), Isaac Newton, and Bertrand Russell all offered some formulation of the logical postulate that the simplest, most parsimonious and obvious explanation is likely. Appropriately, the most elementary phrasing comes from Sherlock Holmes²: 'When you have ruled out the impossible, whatever remains, however improbable, must be the truth.'