Danger! You’re Using The Wrong Data To Teach AI!

Danger! You’re Using The Wrong Data To Teach AI!

Data is thefuel for artificial intelligence. The more data we have, the better the AI will learn and find those hidden patterns, right? Unfortunately, not so much. We have the ability to collect LOTS of data. Consider the nearly 31 billion IoT devices producing information for machine consumption. However, lots of data does not translate into gooddata. As humans, we have not fully grasped what is the slice of data that creates the real value in developing AI solutions. At the heart of this challenge, we have three major obstacles: 1) understanding what’s the real data, 2) validity of belief, and 3) implicit bias.

So, what is “real data?” Simply put, it is the data the machine really needs to learn and perform work. We have fallen into the trap that having big data gives us the key information to enable AI learning. The problem, though, is that more data can lead to more misconstructions and opportunities for bias. Consider what Dr. De Kai from the Hong Kong University of Science and Technology has shared: It takes an AI system hearing roughly 100 million words to learn a language, but a human child only needs to hear approximately 15 million words to learn it. Why is there such a delta? We don’t fully know, but there is a strong argument that it is particular words and phrases that really demonstrate the intricacies of language, not just a sheer volume. This makes for the argument that the secret lies in medium data, not big data. In other words, true AI skill development lies in using the critical data not just large volumes.  

To see the power of medium data, we can look at fake news detection. Unfortunately, there is a lot of fake news out there with a very large amount variability. More variability means more data that AI needs to learn. However, at the University of Washington, the computer scientists at the Allen Institute for AI took a different approach. They created a system called Grover that learned how to write fake news so that it can better detect fake news. To write fake news articles, Grover had to learn what is real news by reading real news articles, which has much less variability than fake news. In effect, through their training strategy, they simplified the amount of data needed and went from vast big data needs to a reduced data set.

The obstacles in the validity of belief is trickier to manage. Fundamentally, each person has assumptions that we consider to be true and wind upholding as fact. For example, what color is the sun? Most people would say yellow, maybe red or orange at sunset. However, the sun is actually white. (Sorry Superman fans, but he shouldn’t really have any powers from a yellow sun.) Most people believe the sun is yellow because the Earth’s atmosphere scatters out the short-wavelength colors, so it winds up looking yellow. What’s the big deal, right? Imagine that we teach AI systems the sun is yellow as a fact.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

Newer Planes Are Providing Airlines a Trove of Useful Data

23 Apr, 2021

With few flights and even fewer passengers, the coronavirus pandemic unleashed a wave of challenges for airlines. Some have gone …

Read more

How to overcome the top 5 DataOps challenges

12 Oct, 2022

As the amount of data has exploded in recent years, executives have faced monumental pressure to put all of it …

Read more

AI Distinguishes Cancer Cells From Healthy Ones

16 Jun, 2022

When it comes to identifying patterns in mountains of data, human beings are no match for artificial intelligence (AI). In …

Read more

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

Remote (United States (Nationwide))

9 May, 2024

Read More

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.