BrothersJudd Blog: EVIDENCE IS A HARSH TASKMASTER:

July 4, 2018

EVIDENCE IS A HARSH TASKMASTER:

Thomas Bayes and the crisis in science (DAVID PAPINEAU, 6/28/18, Times Literary Supplement)

What interested him was not the probability of results given different causes (like the probability of five heads given different kinds of coin). Rather he wanted to know about the "inverse probability" of the causes given the results. When we observe some evidence, what's the likelihood of its different possible causes? Some commentators have conjectured that Bayes interest in this issue was prompted by David Hume's sceptical argument in An Enquiry Concerning Human Understanding (1748) that reports of miracles are more likely to stem from inventive witnesses than the actions of a benign deity. Be that as it may, Bayes's article was the first serious attempt to apply mathematics to the problem of "inverse probabilities".

Bayes's paper analyses a messy problem involving billiard balls and their positions on a table. But his basic idea can be explained easily enough. Go back to the coins. If five tosses yield five heads in a row, then how likely is it that the coin is fair rather than biased? Well, how long is a piece of string? In the abstract, there's no good answer to the question. Without some idea of the prevalence of biased coins, five heads doesn't really tell us anything. Maybe we're spinning a dodgy coin, or perhaps we just got lucky with a fair one. Who knows?

What Bayes saw, however, was that in certain cases the problem is tractable. Suppose you know that your coin comes from a minting machine that randomly produces one 75 per cent heads-biased coin for every nine fair coins. Now the inverse probabilities can be pinned down. Since five heads is about eight times more likely on a biased than a fair coin, we'll get five heads from a biased coin eight times for every nine times we get it from a fair one. So, if you do see five heads in a row, you can conclude that the probability of that coin being biased is nearly a half. By the same reasoning, if you see ten heads in a row, you can be about 87 per cent sure the coin is biased. And in general, given any observed sequence of results, you can work out the probability of the coin being fair or biased.

Most people who have heard of Thomas Bayes associate him primarily with "Bayes's theorem". This states that the probability of A given B equals the probability of B given A, times the probability of A, divided by the probability of B. So, in our case, Prob(biased coin/five heads) = Prob(five heads/biased coin) x Prob(biased coin) / Prob(five heads).

As it happens, this "theorem" is a trivial bit of probability arithmetic. (It falls straight out of the definition of Prob(A/B) as Prob(A&B) / P(B).) Because of this, many dismiss Bayes as a minor figure who has done well to have the contemporary revolution in statistical theory named after him. But this does a disservice to Bayes. The focus of his paper is not his theorem, which appears only in passing, but the logic of learning from evidence.

What Bayes saw clearly was that, in any case where you can compute Prob(A/B), this quantity provides a recipe for adjusting your confidence in A when you learn B. We start off thinking there's a one-in-ten chance of a biased coin but, once we observe five heads, we switch to thinking it's an even chance. Bayes's "theorem" is helpful because it shows that evidence supports a theory to the extent the theory makes that evidence likely - five heads support biasedness because biasedness makes five heads more likely. But Bayes's more fundamental insight was to see how scientific methodology can be placed on a principled footing. At bottom, science is nothing if not the progressive assessment of theories by evidence. [...]

Science is currently said to be suffering a "replicability crisis". Over the last few years a worrying number of widely accepted findings in psychology, medicine and other disciplines have failed to be confirmed by repetitions of the original experiments. Well-known psychological results that have proved hard to reproduce include the claim that new-born babies imitate their mothers' facial expressions and that will power is a limited resource that becomes depleted through use. In medicine, the drug companies Bayer and Amgen, frustrated by the slow progress of drug development, discovered that more than three-quarters of the basic science studies they were relying on didn't stand up when repeated. When the journal Nature polled 1,500 scientists in 2016, 70 per cent said they had failed to reproduce another scientist's results.

This crisis of reproducibility has occasioned much wringing of hands. The finger has been pointed at badly designed experiments, not to mention occasional mutterings about rigged data. But the only real surprise is that the problem has taken so long to emerge. The statistical establishment has been reluctant to concede the point, but failures of replication are nothing but the pigeons of significance testing coming home to roost.

Posted by Orrin Judd at July 4, 2018 6:40 AM

Tweet @brothersjudd

« #NEVERCAESAR: | Main | THE HEIGHT OF rEASON: »