Lessons Learned: What Big Data and Predictive Analytics Missed in 2016

Lessons Learned: What Big Data and Predictive Analytics Missed in 2016

Lessons Learned: What Big Data and Predictive Analytics Missed in 2016

In this era of the software-driven business, we’re told “data is the new oil”, and that predictive analytics and machine intelligence will extract actionable insights from this valuable resource and revolutionize the world as we know it. Yet, 2016 brought three highly visible failures in this predictive view of the world: the UK’s Brexit plebiscite; the Colombian referendum on FARC; and finally, the U.S. presidential election. What did these scenarios have in common? They all dealt with human behavior. This got me thinking that there might be lessons to be learned that are relevant to analytics.

The fact that data can be noisy or corrupted is well known. The question is: how does the uncertainty within the data propagate through the analytics and manifest itself in the accuracy of predictions derived from this data? For the purposes of this article, the analysis can be statistical, game-theoretic, deep learning-based, or anything else.

There is also an important distinction between what I call “hard” data and “soft” data. This is not standard terminology, so let me define what I mean by these terms.

Read Also:
How to ruthlessly use data like a boss without becoming inhuman

Hard data comes from observations and measurements of the macroscopic natural world: the positions of astronomical objects, the electrical impulses within the brain, or even the amounts of your credit card transactions. Typically, such data is objective. The observations are numerical, and the uncertainty is adequately characterized as an error zone around a central value.  There is an (often unstated) assumption that the observation is trusted and repeatable (i.e., nature is not being adversarial and presenting the observer with misleading results).

Much effort has gone into designing measurement apparatus, calibration techniques, and experimental design to reduce the error zones. There is even the so-called “personal equation” to account for observer bias. And, concepts such as error propagation and numerical stability allow numerical computing and statistics to build reliable models from data with this form of uncertainty.

The robustness of such hard data analytics techniques allowed Johannes Kepler to derive his laws of planetary motion in the early 1600s from Tycho Brahe’s observations and, earlier this year, allowed astrophysicists to demonstrate the presence of gravitational waves from data, where the noise outweighed the signal by many orders of magnitude.

Read Also:
10 tools and platforms for data preparation

Soft data, in contrast, derives from observations of a social world and is typically subjective. Observations may be numerical (rank the following on a scale of 1-5) or categorical (classify the following as “agree,” “disagree,” or “neither agree nor disagree”) and are typically drawn from a sample of the entire population. And while human responses are definitely soft, other types of data may also have this characteristic.  In fact, “hard” and “soft” are likely the end points of a spectrum, and we may even want to talk about the “hardness” of the data (just as we do for water – except here hardness is good).

Here’s the important question: Can a behavioral model derived from the soft responses of a population sample reliably predict the actions of the entire population? The sources of error (and uncertainty) are, to my mind, twofold:

The problem of sample fidelity has been studied extensively in statistics, and some form of randomization is the usual solution to the problem. This generally works, but is not foolproof and is subject to challenges in today’s software-driven world.

Read Also:
3 Questions to Ask about your Enterprise Data Lake

When conducting an online-only or mobile phone survey, is a significant segment of the senior citizen demographic overlooked? Or, a socio-economic sector? Investigating spending patterns of buyers in a certain demographic (teenagers with smartphones) via mobile may be fine, but may prove unreliable when looking at voting patterns.

 



Chief Analytics Officer Europe

25
Apr
2017
Chief Analytics Officer Europe

15% off with code 7WDCAO17

Read Also:
Apple aims to up its AI smarts with iCloud user data in iOS 10.3

Chief Analytics Officer Spring 2017

2
May
2017
Chief Analytics Officer Spring 2017

15% off with code MP15

Read Also:
Dynamic APIs for the Age of Digital Business

Big Data and Analytics for Healthcare Philadelphia

17
May
2017
Big Data and Analytics for Healthcare Philadelphia

$200 off with code DATA200

Read Also:
Predictive Analytics: Driving Improvements Using Data

SMX London

23
May
2017
SMX London

10% off with code 7WDATASMX

Read Also:
Cognitive Analytics Answers the Question: What's Interesting in Your Data?

Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
How to ruthlessly use data like a boss without becoming inhuman

Leave a Reply

Your email address will not be published. Required fields are marked *