Ten Myths About Machine Learning

Ten Myths About Machine Learning, by Pedro Domingos

Ten Myths About Machine Learning, by Pedro Domingos

Machine learning used to take place behind the scenes: Amazon mined your clicks and purchases for recommendations, Google mined your searches for ad placement, and Facebook mined your social network to choose which posts to show you. But now machine learning is on the front pages of newspapers, and the subject of heated debate. Learning algorithms drive cars, translate speech, and win at Jeopardy! What can and can’t they do? Are they the beginning of the end of privacy, work, even the human race? This growing awareness is welcome, because machine learning is a major force shaping our future, and we need to come to grips with it. Unfortunately, several misconceptions have grown up around it, and dispelling them is the first step. Let’s take a quick tour of the main ones:

Machine learning is just summarizing data. In reality, the main purpose of machine learning is to predict the future. Knowing the movies you watched in the past is only a means to figuring out which ones you’d like to watch next. Your credit record is a guide to whether you’ll pay your bills on time. Like robot scientists, learning algorithms formulate hypotheses, refine them, and only believe them when their predictions come true. Learning algorithms are not yet as smart as scientists, but they’re millions of times faster.

Read Also:
The Importance of Data Democratization for the Digital Enterprise

Learning algorithms just discover correlations between pairs of events. This is the impression you get from most mentions of machine learning in the media. In one famous example, an increase in Google searches for “flu” is an early sign that it’s spreading. That’s all well and good, but most learning algorithms discover much richer forms of knowledge, such as the rule If a mole has irregular shape and color and is growing, then it may be skin cancer.

Machine learning can only discover correlations, not causal relationships. In fact, one of the most popular types of machine learning consists of trying out different actions and observing their consequences — the essence of causal discovery. For example, an e-commerce site can try many different ways of presenting a product and choose the one that leads to the most purchases. You’ve probably participated in thousands of these experiments without knowing it. And causal relationships can be discovered even in some situations where experiments are out of the question, and all the computer can do is look at past data.

Read Also:
Googles UriBeacon and Apples iBeacon

Machine learning can’t predict previously unseen events, a.k.a. “black swans.” If something has never happened before, its predicted probability must be zero — what else could it be? On the contrary, machine learning is the art of predicting rare events with high accuracy. If A is one of the causes of B and B is one of the causes of C, A can lead to C, even if we’ve never seen it happen before. Every day, spam filters correctly flag freshly concocted spam emails. Black swans like the housing crash of 2008 were in fact widely predicted — just not by the flawed risk models most banks were using at the time.

The more data you have, the more likely you are to hallucinate patterns. Supposedly, the more phone records the NSA looks at, the more likely it is to flag an innocent as a potential terrorist because he accidentally matched a terrorist detection rule. Mining more attributes of the same entities can indeed increase the risk of hallucination, but machine learning experts are very good at keeping it to a minimum.

Read Also:
What Exactly is Data Stewardship and Why Do You Need It?


Big Data Innovation Summit London

30
Mar
2017
Big Data Innovation Summit London

$200 off with code DATA200

Read Also:
Governed Insight: The Power of MDM and Analytics

Data Innovation Summit 2017

30
Mar
2017
Data Innovation Summit 2017

30% off with code 7wData

Read Also:
Splice Machine 2.0 combines HBase, Spark, NoSQL, relational...and goes open source

Enterprise Data World 2017

2
Apr
2017
Enterprise Data World 2017

$200 off with code 7WDATA

Read Also:
4 Easy Tactics for Infusing AI and Predictive Analytics Into Sales Processes

Data Visualisation Summit San Francisco

19
Apr
2017
Data Visualisation Summit San Francisco

$200 off with code DATA200

Read Also:
How to Improve Your E-Commerce Store With Big Data

Chief Analytics Officer Europe

25
Apr
2017
Chief Analytics Officer Europe

15% off with code 7WDCAO17

Read Also:
Stop Funding Data for Superman Systems!

Leave a Reply

Your email address will not be published. Required fields are marked *