Bad Data Mining

November 23, 2016 1:00pm by Barry Ritholtz

I keep promising to stop writing about lessons from the election that are applicable to markets, and then I keep finding more examples. So rather than make any promises I cannot keep, let’s just jump right into this.

Since Donald Trump’s surprise victory — though it wasn’t a surprise to those of you with the power of hindsight — there have been numerous after-the-fact explanations for why Trump beat Hillary Clinton. Many appear to be delightful exercises in data mining, the finding of “historical patterns that are driven by random, not real, relationships.” Add to this the assumption that these explanations are durable and will repeat in the future, and you have the makings of a terrible investment process.

Consider the various claims as to what the key to the election was:

Local health outcomes predict Trumpward swings (the Economist)
Education, not income, predicted who would vote for Trump (FiveThirtyEight)
Two economic variables perfectly predict election results (Statistical Ideas)
Clinton won 64 percent of America’s economic activity versus Trump’s 36 percent (Washington Post)
Clinton won the cities, Trump won the suburbs (New York Times)

None of these elements “predicted” anything. Each was the result of an analysis of what had already occurred. Post-election, data was sifted, a midpoint in each data set was located where a majority of Trump voters resided over Clinton voters, and a conclusion was reached.

This is classic data mining, and it should never be relied upon to make future forecasts.

Salil Mehta, former TARP director of analytics and author of “Statistics Topics,” has been critical of pollsters’ election forecasts. He spent much of the time before the election lecturing them that their models were underestimating the possibility of a Trump victory. In an e-mail exchange, he observed:

There is an increased craving to slice and dice the recent election data, particularly given that the major pollsters have been shamed as they all immensely errored in projecting this year’s election’s victor. All gave President-elect Trump <15% a faux probability of winning. The risk of now retorting with data-mining this single election result is that they often miss an analysis of the predictive errors in this unique match-up (e.g., record high undecideds on Election eve), don’t take into account budding geospatial patterns to validate evidence, and in most case none of this should deceptively be promoted as an election forecasting model.

Correlations are very different from what is required to create a reliable model that correctly forecasts a future election or investing outcomes. Rather than mine data, Mehta suggests instead we engage in hypothesis testing.

The obvious parallel to investing is the myriad of back-tested strategies, many of which engage in similar sorts of data mining as the recent election post-mortems do. They seem to work perfectly in the past, but they are less robust than desired. Models that inform us of what has already happened but not what might occur in the future are of limited value.

Cliff Asness of AQR warns us not to confuse factor investing with data mining. He notes that French-Fama factors such as value, momentum and size have all been tested out of sample and proven to be robust. Out-of-sample testing could verify if an election model’s backtest is valid: Take the five data claims above, then apply them to Obama versus McCain or Bush versus Gore to see if they are at all predictive. The same is true for investing models. To avoid poorly constructed models that are form-fitted to past experience, apply them to different data sets than the test.

If a gold mine is a hole in the ground with a liar standing on top of it, a successful data miner is a quant with a data set lying to himself. You probably have never seen a sales pitch that didn’t have a back test “proving” market-beating returns. If only you had a time machine to go back to the period of time covered by the data set.

Investing after the fact is easy. Investors should be cautious when presented with results that only tell you what just happened, not what is about to occur.

_________

Lots of “multicolinearities” — economic inequality, poor health, low educational attainment — may be associated with Trump voters, but they are not likely to forecast the next election. For example, higher education (and therefore better health and possibly higher income) might present a proclivity toward voting red or blue, but as Mehta points out, not all college degrees are created equal. Some generate much greater potential future incomes than others (“nonheterogeneous”).

Originally: Beware of Data Mining to Help Your Investments

This content, which contains security-related opinions and/or information, is provided for informational purposes only and should not be relied upon in any manner as professional advice, or an endorsement of any practices, products or services. There can be no guarantees or assurances that the views expressed here will be applicable for any particular facts or circumstances, and should not be relied upon in any manner. You should consult your own advisers as to legal, business, tax, and other related matters concerning any investment. The commentary in this “post” (including any related blog, podcasts, videos, and social media) reflects the personal opinions, viewpoints, and analyses of the Ritholtz Wealth Management employees providing such comments, and should not be regarded the views of Ritholtz Wealth Management LLC. or its respective affiliates or as a description of advisory services provided by Ritholtz Wealth Management or performance returns of any Ritholtz Wealth Management Investments client. References to any securities or digital assets, or performance data, are for illustrative purposes only and do not constitute an investment recommendation or offer to provide investment advisory services. Charts and graphs provided within are for informational purposes solely and should not be relied upon when making any investment decision. Past performance is not indicative of future results. The content speaks only as of the date indicated. Any projections, estimates, forecasts, targets, prospects, and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. The Compound Media, Inc., an affiliate of Ritholtz Wealth Management, receives payment from various entities for advertisements in affiliated podcasts, blogs and emails. Inclusion of such advertisements does not constitute or imply endorsement, sponsorship or recommendation thereof, or any affiliation therewith, by the Content Creator or by Ritholtz Wealth Management or any of its employees. Investments in securities involve the risk of loss. For additional advertisement disclaimers see here: https://www.ritholtzwealth.com/advertising-disclaimers Please see disclosures here: https://ritholtzwealth.com/blog-disclosures/

twitter

facebook

What's been said:

Discussions found on the web:

Posted Under

Only 20 Governments Have Been Run by Two or More Women

Trump Wins; What to expect in financial markets

What's been said:

Posted Under

Sign Up for My Newsletter