I’m sure you’ve had this experience. You want to get somewhere, say a concert, or a public building, and all the people are stopped by security officials, who ask to search your bag. They open it, maybe take out one or two items, then glance around inside the rest, before giving it back to you and letting you go. Unfortunately, though they are acting with good intentions, those security guards are wasting their time (and mine).
The most recent time this happened, it reminded me of a startling academic paper, first published in 1978, in the New England Journal of Medicine. Dr Ward Casscells and colleagues reported something very disturbing: that most doctors can’t calculate risks correctly.
The question they posed was this. Imagine a disease (let’s call it Gobble’s disease*), which has a prevalence of 1 in 1000 in your population. There is a test for Gobble’s disease, and you know it has a false positive rate of 5%. You meet a patient in your clinic, who has tested positive. What is the probability that the patient has Gobble’s disease?
A member of the public could be forgiven for thinking the answer is 100%. After all, medical tests are always reliable, right? Someone a bit savvier, say a doctor, might look at that 5% false positive rate, and decide the answer is 95%. That’s what most of the respondents in Casscells’ study said, and he offered his question to senior doctors, junior doctors, and medical students. (And, if you had offered it to me as a medical student or a junior doctor, that’s almost certainly what I would have said – even though, in all fairness, my medical school tried hard to teach us the truth).
But they would all be hopelessly wrong. A statistician would say this: suppose you test the population for Gobble’s disease; a 5% false positive rate means that 5% of your population will test positive for Gobble’s disease, even when they don’t have it. 5% of your population is 50 per 1000. But we know that only 1 in 1000 of the people in your population has Gobble’s disease; therefore your test will be wrong for those 50 people, and right only for that last 1 person. So the probability of your patient – who tested positive for Gobble’s disease – actually having Gobble’s disease is only 1 in 50, or 2%.
This result is so unexpected, so counter-intuitive, that it’s worth looking at more closely.
All medical tests have two basic properties. These are known as sensitivity and specificity. The sensitivity is the probability that the patient will test positive for the disease, if they actually have it. Our fictitious Gobble’s test is, we assume, 100% sensitive it will always detect someone with Gobble’s disease. In practice, few medical tests approach 100% sensitivity.
The specificity is the probability that the patient will test negative for the disease if they haven’t got the disease. Our Gobble’s test is 95% specific: if the patient doesn’t have Gobble’s disease, there is a 95% likelihood that they will test negative for the disease. That sounds great, until we remember that there’s a 5% likelihood they will test positive, which is the cause of all our problems. Sadly, in reality, few medical tests approach 95% specificity.
In reality, sensitivity and specificity are two sides of the same coin. One cannot improve the sensitivity of any test without including more false positives (which might, as we can see, drown out the true positives we are actually interested in). An extreme example is to make every test a positive result: you would never miss anyone with the disease, but there would be so many false positives that your test would be useless.
The reason our test for Gobble’s disease is so unhelpful is that Gobble’s disease is rare. The test becomes much more valuable if Gobble’s disease is more common. Therefore to make it more useful, we shouldn’t apply the test indiscriminately, but we should try to narrow down our focus to people with risk factors. If Gobble’s disease is rare in the young but gets more common in the elderly (as many cancers do), then we can improve the usefulness of the test by applying it only to the elderly.
The other way in which we can improve the usefulness of our test is to combine it with other tests. Say our test is quick and safe. We can apply it easily to a large number of people. But to those who test positive, we can then go on and apply a different test, perhaps one which is more invasive or more expensive. Patients who test positive for both are much more likely to actually have Gobble’s disease.
That security guard, having a quick look through my bag, is applying a diagnostic test: do I have a dangerous item in there, or not? Unfortunately his test isn’t very sensitive, since he might easily miss something down at the bottom. And, since most people going to the concert are there to enjoy the music, the prevalence of miscreants is low. Therefore the simple mathematics of the test tells us it is likely to be worthless. The effectiveness of the test is multiplied by applying a different test: an X-ray scan of my bag, or even of my body. These are much more expensive than a quick visual check, but airports, understandably, are prepared to foot the bill.
There are powerful lessons to be learned here. The first is that applying a single test to a whole population is likely to be very unhelpful, especially if what you are looking for is rare. The second is that medical tests seldom give a clear-cut answer; instead they lengthen or shorten the likelihood of a particular diagnosis being true. Finally, quite a lot of other tests (such as concert security) are subject to exactly the same mathematical rules as medical tests. A thorough understanding of the mathematics of probability will help no end in this endeavour. In the words of William Osler (often described as the father of modern medicine), “Medicine is a science of uncertainty and an art of probability”!
*‘Gobble’s Disease’ is an invented illness from the Oxford Handbook of Clinical Medicine.
Featured Image Credit: ‘Dice, Die, Probability’ by Jody Lehigh. CCO Public Domain via Pixabay.