Ten Myths About Machine Learning, by Pedro Domingos

Ten Myths About Machine Learning

Machine learning used to take place behind the scenes: Amazon mined your clicks and purchases for recommendations, Google mined your searches for ad placement, and Facebook mined your social network to choose which posts to show you. But now machine learning is on the front pages of newspapers, and the subject of heated debate. Learning algorithms drive cars, translate speech, and win at Jeopardy! What can and can’t they do? Are they the beginning of the end of privacy, work, even the human race? This growing awareness is welcome, because machine learning is a major force shaping our future, and we need to come to grips with it. Unfortunately, several misconceptions have grown up around it, and dispelling them is the first step. Let’s take a quick tour of the main ones:

Machine learning is just summarizing data. In reality, the main purpose of machine learning is to predict the future. Knowing the movies you watched in the past is only a means to figuring out which ones you’d like to watch next. Your credit record is a guide to whether you’ll pay your bills on time. Like robot scientists, learning algorithms formulate hypotheses, refine them, and only believe them when their predictions come true. Learning algorithms are not yet as smart as scientists, but they’re millions of times faster.

Learning algorithms just discover correlations between pairs of events. This is the impression you get from most mentions of machine learning in the media. In one famous example, an increase in Google searches for “flu” is an early sign that it’s spreading. That’s all well and good, but most learning algorithms discover much richer forms of knowledge, such as the rule If a mole has irregular shape and color and is growing, then it may be skin cancer.

Machine learning can only discover correlations, not causal relationships. In fact, one of the most popular types of machine learning consists of trying out different actions and observing their consequences — the essence of causal discovery. For example, an e-commerce site can try many different ways of presenting a product and choose the one that leads to the most purchases. You’ve probably participated in thousands of these experiments without knowing it. And causal relationships can be discovered even in some situations where experiments are out of the question, and all the computer can do is look at past data.

Machine learning can’t predict previously unseen events, a.k.a. “black swans.” If something has never happened before, its predicted probability must be zero — what else could it be? On the contrary, machine learning is the art of predicting rare events with high accuracy. If A is one of the causes of B and B is one of the causes of C, A can lead to C, even if we’ve never seen it happen before. Every day, spam filters correctly flag freshly concocted spam emails. Black swans like the housing crash of 2008 were in fact widely predicted — just not by the flawed risk models most banks were using at the time.

The more data you have, the more likely you are to hallucinate patterns. Supposedly, the more phone records the NSA looks at, the more likely it is to flag an innocent as a potential terrorist because he accidentally matched a terrorist detection rule. Mining more attributes of the same entities can indeed increase the risk of hallucination, but machine learning experts are very good at keeping it to a minimum.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

Why the Data Scientist and Data Engineer Need to Understand Virtualization in the Cloud

31 Jan, 2017

This article covers the value of understanding the virtualization constructs for the data scientist and data engineer as they deploy …

Read more

Taking a Systems Approach to Adopting AI

12 May, 2019

To scale the benefits of AI-innovations, companies need to stop thinking of AI tools and applications — such as natural …

Read more

Scikit-Learn vs mlr for Machine Learning

13 Sep, 2019

How does the scikit-learn machine learning library for Python compare to the mlr package for R? Following along with a …

Read more

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.