Oxford University Press's
Academic Insights for the Thinking World

# Separating investment facts from flukes

There are hundreds of investment products in the market that claim to outperform. The idea is that certain information is identified that allow us to pick stocks that will do better than average and those that will do worse than average. When you buy the stocks that will do better and short sell the ones that you think will do worse, you have potentially identified a strategy that will ‘beat the market.’

Most of the empirical academic papers that try to identify these factors are wrong.

Here is the intuition. Suppose we identify some variable ‘X’ that we think will predict the market. We measure the correlation. In the usual approach, we want to have 95% confidence that we have identified something real. In other words, we are willing to tolerate a 5% chance that we have a false positive – or a fluke. The usual test here makes sure that the correlation is two standard deviations away from zero. If the variable passes this hurdle, we declare it ‘significant’.

Now let’s change the problem. Instead of examining a single ‘X’, we look at 20 different ‘X’s, that is  X1, X2, …, X20. We measure the correlation for each one and we find one of these variables, say ‘X7’ that passes the two standard deviation test. We declare it ‘significant.’ However, this is a serious mistake.

Given that we are trying so many variables, one of them might show up with a large correlation purely by chance, i.e. the fluke. In this particular case, finding ‘X7’ is two standard deviations from zero, does not mean there is a 5% chance of a fluke finding – it means there is a 64% chance of a fluke. In short, when you try lots of variables, the two standard deviation rule fails. We must increase the hurdle or you will be disappointed with the performance of your investment.

It is important to consider the history of factor discovery which began in 1964. Indeed, 316 factors have been published in top finance and economics journals and it is likely that most of them are flukes.

In the exhibit below, the green curve and the right-hand side axis shows the history of factor discovery as well as an extrapolation out through 2030. The dashed line (and left-hand side axis) shows the usual rule (two standard deviations from zero) to declare a finding ‘significant’. The blue line shows a new recommended hurdle for declaring something ‘significant’. The ‘x’ symbols represent some of the more prominent discoveries.

This exhibit is supposed to ‘rewrite history.’ Take for example, 1992, the year that the famous Fama and French paper was published in The Journal of Finance, which argued for a three factor model. The extra two factors they discovered are labeled HML (a value factor where you buy high book value to market capitalization stocks and sell low book value to market capitalization stocks) and SMB (buy small stocks and sell large stocks). Both HML and SMB clear the traditional hurdle of two standard deviations. However, in 1992, already many factors had been tried. The blue line shows that if multiple testing was allowed for in 1992, only HML would have been declared significant. The three factor model is really a two factor model. Interesting, post-1992, SMB has failed to deliver a positive average return. This is consistent with its discovery in 1992 being a false positive.

There are two additional considerations.

First, it is important to control for correlation of tests. The existing tools in statistics do not allow for specific correlations. Many strategies in finance are correlated. For example, the correlation of 20 different momentum factors is very high whereas the correlation of 20 different global macro factors could be quite low. It is possible to adjust the blue line in the exhibit for correlation.

Second, while 316 factors have been published in top journals there are many more factors published in lower tier journals and potentially thousands of factors that were tried by academics but never made it to publication. In addition, there are hundreds of practitioner researchers involved in data mining and they don’t even try to publish their factors. Hence, this reinforces the case that the hurdle for significance needs to increase.

Featured image credit: Business calculator calculation by edar. Public domain via Pixabay.