Oxford University Press's
Academic Insights for the Thinking World

Correlation is not causation

By Stephen Mumford and Rani Lill Anjum

Causation as correlation

A famous slogan in statistics is that correlation does not imply causation. We know that there is a statistical correlation between eating ice cream and drowning incidents, for instance, but ice cream consumption does not cause drowning. Where any two factors –  A and B – are correlated, there are four possibilities: 1. A is a cause of B, 2. B is a cause of A, 3. the correlation is pure coincidence and 4., as in the ice cream case, A and B are connected by a common cause. Increased ice cream consumption and drowning rates both have a common cause in warm summer weather.

Nevertheless, there is a prominent philosophical view in which correlation and causation are brought very close together. David Hume (1711-1776), in A Treatise of Human Nature, argued that causation is little more than correlation. All we know is that the cause and effect regularly and constantly occur together, that the cause happens before the effect, and that they occur next to each other in space. It seems, then, that to Hume correlation is sufficient to infer causation, as long as the other two conditions are met.

Why correlations?

Why assume that causation is linked to correlations at all? No correlation discovered in science is a perfect one, after all. There is no ‘constant conjunction’, as Hume called it.

We know that smoking causes cancer. But we also know that many people who smoke don’t get cancer. Causal claims are not falsified by counterexamples, not even by a whole bunch of them. Contraceptive pills have been shown to cause thrombosis, but only in 1 of 1000 women. Following Popper, we could say that for every case where the cause is followed by the effect there are 999 counterexamples. Instead of falsifying the hypothesis that the pill causes thrombosis, however, we list thrombosis as a known side-effect. Causation is still very much assumed even though it occurs only in rare cases.

Correlations are usually now thought of as coming in various strengths. If one changes one variable x, and another, y, regularly changes with it, we take it to indicate some kind of causal connection. But do even these, sometimes weak, correlations constitute causation? Or are they mere signs of it?

Causes as tendencies

Perhaps we need to look for what would be behind such correlations. One could understand a cause, for instance, as a tendency towards its effect. Smoking has a tendency towards cancer, but it doesn’t guarantee it.. Contraception pills have a tendency towards thrombosis but a relatively small one. However, being hit by a train strongly tends towards death. We see that tendencies come in degrees, as do causes, some strongly tending towards their effect and some only weakly.

An essential feature of causation seems to be that an effect can be counteracted additively: by adding something to the situation that tends away from the outcome. We use seat belts, fire alarms, motorcycle helmets, and a number of other security systems, all in the hope of preventing or at least minimizing the effect should the cause happen.

If we believed that a cause was always and necessarily correlated with its effect, there would be no point in trying to interfere additively. All we could then do to prevent an outcome is to make sure that the cause never happens. Let’s say we wanted to avoid high blood pressure. Instead of taking medication, we could remove one or more of the causal factors tending towards high blood pressure: salt, stress, smoking, fat, and so on. We could call this subtractive interference.

Causation in science

Does it matter to science whether we link causation to correlations or tendencies? Arguably so. If we look for causation through correlation data, there might be some tendencies that are too weak to count as scientifically significant. Does this mean that causation is not established?

There are two ideas of causation that seem to go in opposite direction. One is that causation requires robust correlations. The other is that all causal laws are true only ceteris paribus: under ideal conditions. So while most laws of physics are concerned with what happens in vacua, free from any disturbing factors, the world we live in is admittedly not like this.

The first idea, of robust correlation, suggests that if the cause occurs in a variety of contexts, the effect should still occur. This is important when we use statistics to look for causes. The second idea, however, suggests that contextual variation would affect causation. In experiments, for instance, we observe the cause under different conditions to see how it changes the outcome.

Correlation is not causation

If causes are tendencies that can be counteracted by other tendencies, this should change the way we think of causation, away from Hume’s idea of causation as constant conjunction.

Rather than thinking that robust correlations are indicative of causation, they should be taken as evidence for something other than causation. Identity, classification and essence are typical candidates for robust correlations. All water is H2O, all whales are mammals and all humans are mortal. Such truths are not subject to interference and the first thing is always correlated with the second. They do not need ceteris paribus qualifying. In contrast, causal truths share none of these features.

Correlation does not imply causation. At best it might be taken as indicative or symptomatic of it. And perfect correlation, if this is understood along the lines of Hume’s constant conjunction, does not indicate causation at all but probably something quite different.

Stephen Mumford is Professor of Metaphysics at the Department of Philosophy, University of Nottingham, and Dean of the Faculty of Arts. He has written several books on this topic, including Dispositions (OUP, 1998), Laws in Nature (Routledge, 2004), Getting Causes from Powers (with Rani Lill Anjum, OUP, 2011), and Metaphysics: A Very Short Introduction (OUP, 2012). You can see the latest from him on Twitter – @SDMumford

Rani Lill Anjum is Research Fellow at the Norwegian University of Life Science where she leads the Causation in Science research project (CauSci). CauSci is a global network for those interested in a scientifically informed philosophy of causation. She has written many popular articles in magazines and newspapers and delivered numerous talks for non-specialist audiences. She is the co-author of Getting Causes from Powers (OUP, 2011). She is also on Twitter – @ranilillanjum

The Very Short Introductions (VSI) series combines a small format with authoritative analysis and big ideas for hundreds of topic areas. Written by our expert authors, these books can change the way you think about the things that interest you and are the perfect introduction to subjects you previously knew nothing about. Grow your knowledge with OUPblog and the VSI series every Friday and like Very Short Introductions on Facebook. Subscribe to on Very Short Introductions articles on the OUPblog via email or RSS.

Subscribe to the OUPblog via email or RSS.
Subscribe to only science and medicine articles on the OUPblog via email or RSS.
Image credits: 1) Contraceptive pill, by Bryancalabro (Own work) [CC-BY-SA-3.0], via Wikimedia Commons 2) David Hume, by Allan Ramsay [Public domain], via Wikimedia Commons

1. […] One of the most important rules of research… […]

2. […] previous studies have relied on correlational evidence. But (taking from and freely interpret another recent blog post) “… correlation does not imply causation. At best it might be taken as indicative or […]

3. David Blockley

I take the view that our knowledge constitutes models (in Popper’s world 3) of the actual world (Popper’s world 1). These models have been tested to various degrees in certain contexts e.g. Newtonian mechanics is highly tested in situations where velocities are not approaching the speed of light. In other contexts (at velocities near to speed of light) the models may be shown to be false. We can infer from those models as long as we understand the contexts in which they work. Where our models are probabilistic then the models are expressed as trends. I think we should think of correlation as being useful where we have no explanatory models. Causation is useful and applicable when we do have explanatory models.