“Statistics,” as an old saying has it, sometimes “are used much like a drunk man uses a lamp post: for support, not illumination”. This sounds bad, but is it? And if so, why?
Scientists sometimes use statistics to support an argument because statistics appear to lend authority that otherwise may seem lacking. It may seem more convincing to mention p-values (or DAIC’s, or what have you) for quantitative statements that are clearly sound; for example, if there are two treatments with 50 individuals each, and in one group 40 reach maturity while in the other group only 10 do, a significance test comparing the treatments provides no new information. But sometimes editors or reviewers require such tests. This excessive use of statistics may make research sound falsely authoritative – and we think it should be avoided – but the harm is just that it can make it harder for readers to see the essence of research because there is so much science-y sounding stuff in the paper. Bad, but not necessarily terrible.
There have been big changes over the last decade or so in the way ecologists practice statistics. For example, we focus much less on null hypothesis significance testing and much more on estimation and on model selection. This entails an awareness that statistical analyses are modeling tools, and that in doing statistics we are looking for useful, well-supported, models of reality. This is often a somewhat more natural way of thinking than hypothesis testing, with its focus on asking what the probability of a more extreme test statistic is, if the (often certainly untrue) null hypothesis were true. This shift to model-centered statistics has facilitated our ability to move from requiring that all data be independent, to including correlated data structure in our models. Generalized linear mixed models, spatial models, and many other useful tools have gained acceptance as a result.
Several years ago Brian McGill warned of “statistical machismo” in ecology, suggesting that ecologists waste effort by using statistics that are more complicated than necessary. That may sometimes be a problem, but we think the bigger problem from using excessively complicated statistics is the potential for falsely convincing others (and ourselves) that our scientific argument is stronger than it really is. A second reason drunkards sometimes cling to lamp posts – we suppose – is that they think they can hide the instability of their strides. And sometimes they probably can, for a while. Scientists sometimes may do something similar: we use statistics to hide the instability of our arguments. When this occurs, the big problem is not that some effort is wasted, or that macho individuals swagger and boast about their statistics; the larger problem is that scientific conclusions may appear to be better supported than is warranted.
It may be inevitable that scientists rely on statistics for support in both senses – stabilizing shaky arguments as well as lending authority to them. After all, we need statistics partly because our ability to reliably discern pattern in a noisy world is quite limited. In this sense, hanging on to some external support can be part of a sensible strategy. But if you rely too strongly on external support – a lamp post or a flashy statistic – to do something, that is inherently weak or unstable, then you are eventually likely to wind up on the ground, injured and embarrassed.
How can you reduce the frequency with which this occurs? Prior to beginning new statistical analyses, be certain of your conceptual understanding of the methods you intend to use. How do the statistical models relate to the scientific questions you wish to address? While there are often many methods to choose from for a particular data analysis, each one asks questions that are different, and the resulting statistics have different meanings. In switching from one model to another, we may find that we have a more tractable statistical question, but are addressing a different scientific question. While we do not think that every ecologist must know the mathematics underlying the statistics they use, you really must understand the models you use in a qualitative way. A useful rule of thumb is to ask whether you can explain your statistical methods to a scientist untrained in them.
What resolution does your analysis require? Ecology is complicated, and many different things may determine outcomes – but that doesn’t mean that they must all be part of your statistical model. For example, if you are studying individual growth rates of trees, some scientific questions may be adequately addressed by considering a few physical factors, like moisture availability or soil nitrate concentrations. Some questions might require that you also consider density of neighbors. Still, other questions might require that you consider how closely related those neighbors are. But which of these factors are actually needed to address the questions of interest? Using more complex statistics is not a virtue by itself; if older methods like linear regression are adequate (and appropriate, given the data), use them!
Useful empirical research involves coherence between three things: data, statistics, and mechanistic explanation. The statistical part involves more than underlining statements about observed patterns with something like “p < 0.0001;” it involves modeling the data, and sometimes modeling the underlying biological process thought to generate the data. Publishing meaningful research – as well as interpreting others’ research – requires a clear conceptual understanding of the models. The alternative is believing things we really don’t quite understand, and as Stevie Wonder once put it, “When you believe in things you don’t understand, then you suffer. Superstition ain’t the way.”
Feature Image credit: Lamp post by MichaelMaggs. CC BY-SA 3.0 via Wikimedia Commons.