When Donald Trump first declared his candidacy for President of the United States, most analysts predicted that he has an incredibly small chance of becoming the Republican nominee. Probably the most prominent of these was Nate Silver from FiveThirtyEight. He estimated that Trump had a 2% chance of winning the nomination. This estimation was based on multiple significant historic data points about past candidates, such as the background they came from, whether they were widely endorsed by the party, and their past successes and failures. This is a standard prediction approach based on the underlying assumption that what you are trying to predict (Trump) is comparable to its historical antecedents (past GOP candidates) and thus can be evaluated according to their performance. However, as it is clear to us now, in some unique cases like the Trump phenomenon, we could only learn little from recent direct history.
A similar problem crops up in polling. Political analysts use polls in order to estimate the likelihood of a candidate’s success. However, polls are not perfect, and usually suffer from multiple types of biases — such as the effect of non-responders, the tradeoff of polling by calling landlines versus cellphones, and changes in voting turnout trends. To overcome these obstacles, political statisticians build models that try to correct polling errors by using data from previous elections. This method is based on the underlying assumption that current and historical polls suffer from the same type of errors. For example, analysts might assume that the population of non-responders is distributed similarly across time — an assumption that may or may not be true.
Compounding both problems, since presidential elections are a relatively rare event, our historical data is limited; in other words, the sample size is relatively small and outdated.
Predictive statisticians in the private sector face similar problems when trying to predict unexpected events, or when working from flawed or incomplete data. Simply turning the work over to machines won’t help: most machine learning and statistical mining techniques also hold the assumption that historical data, which is used to train the machine-learning model, behaves similarly to the target data, to which the model is later applied. However, this assumption often does not hold as the data is obsolete, and it is often expensive or impractical to get the additional recent data that holds this assumption.
Thus in order to stay relevant, statisticians will have to get out of the purist position of fitting models that are based solely on direct historical data, and to enrich their models with recent data from similar domains that could better capture current trends.
This is known as Transfer Learning, a field that helps to solve these problems by offering a set of algorithms that identify the areas of knowledge which are “transferable” to the target domain. This broader set of data can then be used to help “train” the model. These algorithms identify the commonalities between the target task, recent tasks, previous tasks, and similar-but-not-the-same tasks. Thus, they help guide the algorithm to learn only from the relevant parts of the data.
In the example of the U.S.