Sunday Nov 27, 2022

A recent article in the Journal of Statistics and Data Science Education reports on a study where a cohort of students from a high-ranking Japanese university were asked to solve a series of problems relating to probability and statistics. The catch: the problems were chosen so that the most obvious common sense answer would diverge from the correct solution provided by probability theory and statistics. Over and over, these highly intelligent, highly educated students were misled by their intuition and gave the wrong answer.

One may quibble over the appropriateness of the specific questions chosen and the conclusions drawn by the authors. But the problem that they raise is a genuine one, and is one of the primary reasons why learning and teaching statistics is so hard. We just seem to be quite terrible at intuitive probabilistic thinking. Perhaps our poor human brains are just not built for this kind of thing. Or perhaps it is because statistical thinking skills are usually introduced very late during education.

However, there is a further dimension to the problem, and one that may be more easily addressed. Learning statistics will never be easy – but it shouldn’t be THIS hard. Most introductory courses are centered on a paradigm called Null-Hypothesis Significance Testing (NHST). When-ever you hear about P-values, confidence intervals or null hypotheses, that means you are working within this paradigm. It is a way of doing statistics that seems to be designed to work against our intuitions as humans and as scientists. Alternatives exist – as an example, you can Google Richard McElreath for a different approach (as a side-note – check out this piece of absolute madness that I just found).

As a statistics educator, one now finds themselves in a tough spot. One feels an obligation to teach NHST – because that is what is in most of the papers the students will read. A poor understanding of P-values is dangerous for a scientist, and so we must prioritise the concept, even if other approaches may make more pedagogical sense. What’s more, P-values are useful – it is essential to know what kinds of patterns can be expected by chance alone, to provide context for the result obtained from our data. But P-values do not deserve centre stage in our final conclusions.

I can envisage a compromise. A course that would introduce statistical thinking starting from a Bayesian perspective – the primary competitor to the statistics of P-values, and arguably a way of thinking that is more in line with how scientists naturally reason about their data. Once the students are comfortable thinking about probabilities from such a perspective, P-values can be introduced as well. I would wager that this would be an easier learning curve than starting from NHST, as is currently the case in most curricula. None of what I have said is original in any way. Indeed, such arguments have been made for decades, but their real-life impact on introductory statistics courses has still been quite weak. I am also convinced that curricula like the one I mentioned have been tried before – I would be grateful if anybody knew of good examples. But it would be fun to try to put such a course together. I will do my best and I will report back on progress!