“Image you are flipping a coin a 1000 times. You get 537 heads. The probability of getting this result or “worse” using a fair coin is one percent. But you didn’t get 537 or worse, you got exactly 537 heads. The “odds” of getting this event are roughly 10 to 1 for a biased coin versus a fair coin.”

The above paragraph is in quotes to emphasize that the story is not original with me or even exactly correct. Something like it appeared in a New York Times article that I read in the paper version. I am quoting the gist of sentences as best I remember since the on-line article here does not contain them. I know that the reasoning in the paragraph is just a succinct argument for the Bayesian view of statistical practice. I have however started to put two and two together.

The article refers to work done by Jeffrey N. Rouder of the University of Missouri and Richard D. Morey of the University of Groningen. Their paper can be found here and therein can be found the 4 to 1 “odds” that the original article quoted (Not the 10 to 1 that I made up).

Before I decided to research this matter, I tried to reproduce the 4 to 1 ratio using the coin flipping experiment. I used a spreadsheet to calculate the probability of getting 537 heads out of 1000 using a fair coin and then calculated the probability of getting 537 heads for various biased coins with different probabilities of getting a head. Since I vaguely know that .537 is the maximum likelihood estimator for the actual heads probability of the coin used, I weighted each of the calculated probabilities using a biased coin using a normal (because having a maximum means there is a peak) distribution centered at .537. I then used the weighted average of the probabilities to get an estimate of getting 537 with unfair coins. By tinkering with the spread of the normal distribution I could get just about any ratio for the probability of getting 537 heads with a fair coin versus the probability of getting 537 heads with an unfair coin. This was clearly unsatisfying so I went searching on-line. I found a talk by Leonard Jimmie Savage that was apparently seminal but it didn’t yield to a cursory reading. I then went back to the original article and the referenced paper.

The upshot is that I think that the p-value method of statistical reasoning is losing its dominance. The stark paragraph at the beginning of this post describes the theoretical reason, but I think that “real-life” is making even a stronger argument. Rouder and Morey use Bayesian analysis as a way of thinking about recently published evidence for ESP and concluded that the evidence was not as strong as classical statistics says it is. An article in The Atlantic magazine describes the work of Dr. John Ioannidis who has shown that many well-designed medical studies whose conclusions had been translated into medical protocols are wrong. These men are calling into question the predominant statistical paradigm. They are making an assault on the p-value way of reasoning. I expect to see terms like “effect size” and “Bayes factor” creep into basic statics textbooks any time now if they haven’t already. And after that the deluge.