When I started my career as a medical statistician in September 1972, medical research was very different from now. In that month, the Lancet and the British Medical Journal published 61 research reports which used individual participant data, excluding case reports and animal studies. The median sample size was 36 people. In July 2010, I had another look. The two journals published 31 such reports, with median sample sizes of 6,000 people. The journals published fewer research papers, which were correspondingly longer, but with enormously greater sample sizes.
There were other changes. In 1972, only 58% of research papers in the Lancet and the British Medical Journal included the results of any statistical calculations (nearly all significance tests), and only three reports gave any reference to a statistical work, in each case to a textbook which was already out of date. In 2010, all research papers in these journals included details of their statistical analyses, with in one case a paragraph on the methods in the Research Methods section, in all the others a subsection devoted to statistics, and most papers reporting confidence intervals rather than, or in addition to, significance tests. I think that the increased sample sizes, the greater length of papers, and the increased statistical detail are all indicators of greatly increased research quality in the top medical journals.
What led to this research revolution? One force was the movement for evidence-based medicine, spreading the idea that treatment decisions should be based on objective evidence rather than on experience and authority. Such evidence would include statistics. Use of the term evidence-based medicine began in the 1990s with the work of Gordon Guyatt and David Sackett, but the ideas were around long before then. Statisticians, whose business was the evaluation of evidence, were enthusiastic cheerleaders. The demand for evidence led to systematic reviews, where we collect together all the trials of a therapy which had ever been carried out, and try to form a conclusion about effectiveness. Iain Chalmers led a huge project to assemble all the trials ever done in obstetrics. He went on to found the Cochrane Collaboration, which aims to do the same for all of medicine. The Lancet and the British Medical Journal now typically include a systematic review every week.
As an alternative solution to the problem of inadequate sample sizes, Richard Peto led the call for large, simple trials; his first being the ‘First International Study of Infarct Survival’. Published in 1986, the report of this trial included the sample size of 16,027 patients in the title. Unlike Guyatt, Sackett, and Chalmers, who are, or were doctors, Richard Peto is a statistician. Another statistically-led movement was to evaluate evidence using confidence intervals rather than significance tests, particularly for clinical trials. The idea was to estimate the plausible size of the difference between treatments rather than simply say whether there is evidence that a difference exists. A paper by Martin Gardner and Doug Altman in 1986 led to the British Medical Journal including this in its instructions for authors. Other journals, such as the Lancet, followed suit.
Reviews of the quality of statistical methods in medical journals began to sting journal editors into action and led to instructions to authors about statistical aspects of presentation of results. Following reviews of statistics, journals began to introduce statistical referees, with the systematic use of a panel of statisticians to check all research papers before they appeared in the journal. The main difficulty was finding enough statisticians. Finally, in 1996 the first consolidated statement on reporting trials (CONSORT) was published, giving guidelines for reporting trials, encouraging researchers to provide information about methods of determining sample size, allocation to treatments, statistical analysis, etc.
We cannot know which, if any, of these forces is responsible for improvements in the statistical quality of the top clinical literature. We should beware of the logical fallacy of post hoc ergo propter hoc – just because improvements followed all this activity does not necessarily imply that they were caused by it. As statisticians say, correlation does not imply causation. But I think that the combination of factors did matter and it was exciting to live through it, especially as I have known, and in some cases worked with, nearly all the major actors.
The quality of top clinical research has improved greatly, but does this matter? Has medicine improved? I wondered what might be a good indicator and settled on life expectancy at age 65, the average number of years 65-year-olds would live if the current death rates were to apply through their remaining time. I thought that that the health of the old may respond particularly to improvements in medicine, as the old are its main consumers. I knew that from its first calculation for England and Wales in 1841, life expectancy at 65 changed very little for a century.
As the graph shows, in the 19th century there was little difference between the life expectancy of men and women. For women, a slight increase began at the start of the 20th century, which continued throughout the century. One possible explanation for this is that women in the 20th century had far fewer pregnancies than women in the 19th, and so arrived at age 65 healthier and fitter than previous generations. For men, life expectancy at 65 increased very little until 1971. Then it began to rise rapidly, faster than that for women, so that men have almost caught up. Women considerably outliving men may be a 20th century phenomenon, because now expectation of life at age 65 is 18 years for men and 21 years for women. For both, the remaining years of life are half as much again as they were. This period of rapid improvement in statistical methods in the best medical research has coincided with a rapid improvement in the health of the older members of the population. People are living longer, healthier lives.
For the writer of medical statistical textbooks, these changes have required a lot of updating and expansion of succeeding editions, to accommodate the new methods and larger studies appearing in journals. As I am now over 65, I can look back and think that it was definitely worth it.
Featured Image Credit: Running, Runner, Long Distance, by Skeeze. CC0 Public Domain via Pixabay.