Danger! You’re Using The Wrong Data To Teach AI! Blog

Danger! You’re Using The Wrong Data To Teach AI!

by 7wData
August 29, 2020

Data is thefuel for artificial intelligence. The more data we have, the better the AI will learn and find those hidden patterns, right? Unfortunately, not so much. We have the ability to collect LOTS of data. Consider the nearly 31 billion IoT devices producing information for machine consumption. However, lots of data does not translate into gooddata. As humans, we have not fully grasped what is the slice of data that creates the real value in developing AI solutions. At the heart of this challenge, we have three major obstacles: 1) understanding what’s the real data, 2) validity of belief, and 3) implicit bias.

So, what is “real data?” Simply put, it is the data the machine really needs to learn and perform work. We have fallen into the trap that having big data gives us the key information to enable AI learning. The problem, though, is that more data can lead to more misconstructions and opportunities for bias. Consider what Dr. De Kai from the Hong Kong University of Science and Technology has shared: It takes an AI system hearing roughly 100 million words to learn a language, but a human child only needs to hear approximately 15 million words to learn it. Why is there such a delta? We don’t fully know, but there is a strong argument that it is particular words and phrases that really demonstrate the intricacies of language, not just a sheer volume. This makes for the argument that the secret lies in medium data, not big data. In other words, true AI skill development lies in using the critical data not just large volumes.

To see the power of medium data, we can look at fake news detection. Unfortunately, there is a lot of fake news out there with a very large amount variability. More variability means more data that AI needs to learn. However, at the University of Washington, the computer scientists at the Allen Institute for AI took a different approach. They created a system called Grover that learned how to write fake news so that it can better detect fake news. To write fake news articles, Grover had to learn what is real news by reading real news articles, which has much less variability than fake news. In effect, through their training strategy, they simplified the amount of data needed and went from vast big data needs to a reduced data set.

The obstacles in the validity of belief is trickier to manage. Fundamentally, each person has assumptions that we consider to be true and wind upholding as fact. For example, what color is the sun? Most people would say yellow, maybe red or orange at sunset. However, the sun is actually white. (Sorry Superman fans, but he shouldn’t really have any powers from a yellow sun.) Most people believe the sun is yellow because the Earth’s atmosphere scatters out the short-wavelength colors, so it winds up looking yellow. What’s the big deal, right? Imagine that we teach AI systems the sun is yellow as a fact.

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Danger! You’re Using The Wrong Data To Teach AI!

Leave a Reply Cancel reply

Upcoming Events

MarkLogic World | Amsterdam

Knowledge Graph — The Ultimate Center of Excellence

From Text to Value: Pairing Text Analytics and Generative AI

Bringing Data Closer to Decision Makers with Data Fabric

Categories

Tags

You Might Be Interested In

Newer Planes Are Providing Airlines a Trove of Useful Data

How to overcome the top 5 DataOps challenges

AI Distinguishes Cancer Cells From Healthy Ones

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

IT Engineer

Data Engineer

Applications Developer

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

Danger! You’re Using The Wrong Data To Teach AI!

Leave a Reply Cancel reply

Upcoming Events

Categories

Tags

You Might Be Interested In

Recent Jobs

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

To Drive Analytics Adoption
And manage change