My current screen saver is this graphic:
I found it on Andrew Gelman’s blog here. The red areas are based on an alpha (probability of rejecting a null hypothesis) of 0.05. A recent paper (preprint) endorsed/authored by a fat paragraph of statisticians recommends an alpha of .005 for “new discoveries.” I wanted to know how Dr. Gelman’s graph would change.
First here is my rendition of the original:
This is a normal curve with mean 2 and standard deviation (standard error) 8.1. It represents the true state of the world and is being compared to the null hypothesis with mean 0 and standard deviation 8.1. Assuming the null hypothesis with alpha of 0.05, the rejection (in the red) areas are set at plus or minus 1.96 times 8.1. Effect size here is the mean of the experiment divided by the standard error, in this case 2/8.1 around .25. The other information on the graph comes from a slight modification of the R code in this article. Power is the probability that an effect will be detected, that is, that we land in the red areas. This probability is .057 or 6 percent. Though the “real” mean is positive (2), 24% of the red area is to the left and negative. Thus the type S error. The exaggeration is got by simulating the process with mean 2 and standard deviation 8.1 ten thousand times and averaging the absolute values of those estimates that land in the red areas. Here the exaggeration ratio is nine.
Next I created this graph using the same methods with alpha = .005.
The rejection areas are barely visible. Finding a small effect using this method will occur much less often but if we do “find” it, there will still be a fair change of getting the sign wrong and effect will be much exaggerated.
All this is to say that it is hard to find a small effect in noisy data and if you do, the results will be deceptive. This is why scientific experimenters spend much of their time controlling for and eliminating extraneous variables. And yet sometimes discovering small effects can be important. Imagine discovering a drug that cures 0.1 percent of people with a common disease, say 0.1 percent of 10,000,000 – curing 10,000 people. The small effect exists but we can’t find it using reasonable sample sizes with statistical methods alone. This argues for putting more effort into understanding biological mechanisms and working to identify special subpopulations.