Many researchers and layman are familiar with the dreaded S word: statistics. It can mean many things to people: the point at which you begin to cry when you realize that your results are not “significant”, a hopeless muddle as you attempt to work out how to analyse your data, or a meaningless concoction of numbers and notation which allows you to nod your head sagely at whatever it is that you read which happens to casually drop phrases such as “p-value”, “significant”, or “test statistic”. Come to think of it, to most people statistics may as well be voodoo. It allows one to feel intelligent without knowing what the hell you are talking about and chances are other people won’t know too.
The biggest problem, I think, is that statistical thinking is first and foremost a type of thinking which is almost unique to the discipline. It contains math but is not like most mathematical thinking; those well-versed in the art of math are sometimes unprepared for the concepts of distribution, error, sampling, hypotheses, and statistics themselves. The primary goal of statistics is to make sense of data and probabilities; how can you calculate the probability of x occurring given that it follows a certain distribution, what does a probability even mean?, and how do you make inferences about a population, say the height of people, given that you can only take a random sample?
As someone with an advanced yet limited knowledge on the subject, the problems I see appear to stem from the fact that people do not know how statistics work on the most basic level. And by basic I do not mean that they can apply a statistical test, get a result, and then say whether or not it is significant. That recipe is most certainly bound for disaster. The fundamental questions researchers must ask themselves and train their students to ask is: what am I doing and how will I get there?
Statistical theory or “philosophy” is broadly categorized into two groups: frequentist and Bayesian. Since I have a very limited knowledge on the latter, I will focus more on the former. Generally speaking, the two differ on how they treat probabilities; to a frequentist, a probability means that, for example, if an unbiased coin is tossed one hundred times, fifty of the tosses will come up heads and fifty tails. This coincides with a probability of 0.5. Of course, if someone were to conduct those trials they may not get fifty heads and fifty tails. To a frequentist, the more trials which are conducted the closer you will come to reflecting the true probability of the getting a head or a tail (i.e 0.5). Bayesian analysis, however, does not take the frequentist approach to probabilities. To a Bayesian, probability reflects your inherent uncertainty in an event occurring. For example, say the probability that it will rain tomorrow is 0.4; this means that you are 40% certain that it will rain tomorrow. Usually, Bayesian probabilities are reflected in relation to another event; for example, you want to know what the probability is that it will rain tomorrow given that the probability it rained last week was 0.6. The useful thing about Bayesian statistics is that it allows you to incorporate new pieces of information when you are calculating probabilities which otherwise cannot be done by the frequentist approach.
Now the important thing to remember is that statistics, like any other field of science, is subjective to a certain degree. There may not be a right way to analyse something, but there most certainly is a wrong way. There were many stories told to us in statistics class on hapless biologists (my field) who made trivial mistakes easily fixed if the biologists in question had an idea of what they were doing. Statistics is hard and complicated, but it yields many rewards when true insight is gained. Take, for example, hypothesis testing. In the simplest case, you are interested in whether the mean of some arbitrary population is, say, x. The population could be the number of spots on a leopard and you are interested if the mean is greater than 20. The great thing about statistics is that allows you to make inferences about the population of leopards without actually sampling every individual in that population; all you need to do is take a random sample (the caveat is in the word random – every effort must be made to reduce possible bias in your sampling method). You would then formulate your null hypothesis and alternative hypothesis. In our example, our null hypothesis is that the mean is 20, while our alternative hypothesis is that it is greater than 20. This can be written as:
Ho: µ = 20
H1: µ >20
The important thing is that the null hypothesis can never be what you are trying to show, because all frequentist hypothesis tests will test the null result which in this case is that the mean is 20. Now pretend that you have been outrageously lucky and that you know for certain that your population (number of spots on a leopard) is normal, and that a magical unicorn has told you what the population variance is. This will allow you to undertake what is termed a Z-test (a statistical unicorn if you excuse the pun). You set your significance level, and calculate your test statistic after some brave soul has gone out and collected your data on the number of leopard spots. This information will then allow you to either reject the null hypothesis or not. Some people tend to refer to rejecting the null hypothesis as “accepting” the alternative hypothesis or vice versa – this is not really correct as you can never “accept” a hypothesis; you just have insufficient evidence, which is not the same thing as making a truth claim. The alternative hypothesis could be true but your data suggests that the null hypothesis cannot be rejected, so it would be wrong to say the null hypothesis is therefore true (such as situation is termed a Type II error or false negative result).
The fantastic thing about statistics is that once you know how it works you can apply it to any novel situation. The caveats rest on your assumptions. In the above example, if the number of spots on a leopard did not follow a normal distribution, then our conclusions would be patent nonsense. There are ways to test if a population seems to follow a certain distribution, but that will not mean that the population definitely does follow that distribution.
Statistics is as much an art as it is a science. Some people dislike its inherent uncertainty, but uncertainty and hand waving is practically true in every field of science. How certain are we of our claims? What evidence gives support to my claims? Most importantly, what evidence does not support my claims? The most important thing in science is to get good data and analyse it correctly, making reasonable assumptions in order to do the analysis. Finally, statistical significance does not imply biological significance; it shows that you may have an interesting result and you should definitely look into it. Remember: repeatability is respectability!