I sometimes wonder how the future may have turned out if R.A Fisher had suggested that the 1% significance cut-off be used in research instead of the standard 5% significance cut-off. To the uninitiated, such “tests of significance” are when “we can examine whether or not the data are in harmony with any suggested hypothesis.” One is testing a null hypothesis (Ho) against an alternative hypothesis (H1) e.g:
Ho: There is no effect between x and y
H1: There is an effect between x and y
There are many different hypothesis tests one can do, which usually depend on the nature of the data and underlying assumptions one can make. It is commonly known that a cut-off value of 0.05 indicates that a result which falls below this is “significant” i.e likely to not have been produced solely by chance. That does not automatically imply that the result is significant with respect to your research question (is it significant in nature?), and many an erroneous conclusion has been drawn in papers on this false assumption. But what does R.A Fisher’s conception of such “p-values” have for the replication crisis in psychology?
Psychology has been in crisis for some time now, not including plain fraudulent research which merits a separate article. Psychologist Brian Nosek and collaborators attempted to replicate 100 important results in psychology in a bid to make science more transparent; it would be an understatement to say that their results were not good for the field. In their own words:
“Ninety-seven percent of original studies had significant results (P < .05). Thirty-six percent of replications had significant results; 47% of original effect sizes were in the 95% confidence interval of the replication effect size; 39% of effects were subjectively rated to have replicated the original result; and if no bias in original results is assumed, combining original and replication results left 68% with statistically significant effects.”
As the authors note, it is difficult to say when a result has been replicated, that is, it agrees with the original findings. This result is not new. In 2005, statistician John Ioannidis published a paper in PLoS with the startling headline: “Why Most Published Research Findings Are False”. The basis for this claim goes back to p-values, or the fallacy of judging work based on a useful, but flawed, statistical measure.
Let us pretend that we are attempting to find out if there is an effect between events x and y. Now, before we undertake the study we are under the epistemological assumption that there is some “true” effect between x and y which can only be ascertained by undertaking an experiment and analysing the data. This “true” effect is somewhat of a black box; we know that it exists but we can never be “certain” we will uncover it through our experiment. As Ioannidis points out: “As has been shown previously, the probability that a research finding is indeed true depends on the prior probability of it being true (before doing the study), the statistical power of the study, and the level of statistical significance.” Statistical power is a measure of the ability to detect an effect, and has an antagonistic relationship to the level of significance. So if one increases the level of significance in order to reduce the chance of their finding being a “false positive”, one must also accept that their power to detect is decreased since one has now increased the chance that their finding is actually a “false negative”. In science, there are no free lunches. This is mainly why Fisher settled on the figure of a 5% significance cut-off, which he saw as generating an acceptable threshold for false positives while having a reasonable statistical power to detect a difference, if it exists. In fact, Fisher highlighted many of the problems associated with finding “true” effects in Statistical Methods For Research Workers.
Science is very hard. When one has to literally claw one’s way to a degree, it is easy to see why some psychologists are extremely upset that their findings may not actually be “true”. Slate recently published an article on a textbook psychology finding which failed to replicate. The author of the original study, Fritz Strack, instead of embracing the findings found ways to dismiss the replication. I think it is important to realise that if you are in the business of doing science, there are some startling facts one must learn, including that the workings of nature do not depend on one’s feelings or academic reputation. As a researcher, I find that in order to do good work there is a necessary component of emotional investment. The downside being that any criticism, even very reasonable and fair examination of your work, can feel like an attack on your personal integrity. But it is necessary. Replication is necessary. It is far worse to go on believing something which is false, simply because you desire it to be true, rather than for the truth, however rudely it may appear, to become known. Otherwise, we may as well all be stamp collectors.