Machine Learning Data for Self-Driving Cars: Shared or Proprietary

Machine Learning Data for Self-Driving Cars: Shared or Proprietary

The crux of any machine learning approach involves data. You need lots and lots of usable data to be able to “teach” a machine. One of the reasons that machine learning has progressed lately is due to the advent of Big Data, meaning tons of data that can be readily captured, stored, and processed. Why is there a necessity to have an abundance of data for purposes of doing machine learning? Let’s use a simple but illustrative example to explain this. Imagine if you wanted to learn about birds and someone showed you only one individual picture of a bird (and furthermore, let’s assume you had never seen any birds in your lifetime). It might be difficult to generalize from one picture and discern the actual characteristics of a bird. If you saw perhaps 50 pictures you’d have a greater chance of discovering that birds have wings, they have beaks, etc. If you saw thousands and thousands of pictures of birds you’d be able to really begin to figure out their characteristics, and even be able to classify birds by aspects such as distinctive colors, distinctive wing shapes, and so on.

For self-driving cars, many of the self-driving car makers are utilizing machine learning to imbue their AI systems with an ability to drive a car. What kind of data are the developers using to “teach” the automation to drive a car? The developers are capturing huge amounts of data that arises while a car is being driven, collecting the data from a myriad of sensors on the car. These sensors include cameras that are capturing images and video, Radar devices that capture Radar signals, LIDAR devices that capture laser-based distance points data, and the like. All of this data can be fed into a massive dataset, and then crunched and processed by machine learning algorithms.  Indeed, Tesla does this data collection over-the-air from their Tesla cars and can enhance their existing driving algorithms by examining the data and using it to learn new aspects about how their Autopilot software can improve as a driver of the car.

How much data are we talking about?

One estimate by Intel is the following:

Radar data: 10 to 100 KB per second

Camera data: 20 to 40 MB per second

Sonar data: 10 to 100 KB per second

LIDAR: 10 to 70 MB per second

If you add all that up, you get about 4,000 GB per day of data, assuming that a car is being driven about 8 hours per day. As a basis for comparison, it is estimated that the average tech-savvy person uses only about 650 MB per day when you add-up all of the online social media, online video watching, online video chatting, and other such uses on a typical day.

The estimates of the data amounts being collected by self-driving cars varies somewhat by the various analysts and experts that are commenting about the data deluge. For example, it is said that Google Waymo’s self-driving cars are generating about 1 GB every second while on the road, which makes it 60 GB per hour, and thus for 8 hours it would be about 480 GB. Based on how much time the average human driver drives a car annually, it would be around 2 petabytes of data per year if you used the Waymo suggested collection rate of data.

There’s not much point about arguing how much data per se is being collected, and instead we need to focus on the simple and clear cut fact that it is a lot of data. A barrage of data. A torrent of data. And that’s a good thing for this reason – the more data we have, the greater the chances of using it wisely for doing machine learning. Notice that I said we need to use the data wisely. If we just feed all this raw data into just anything that we call “machine learning” the results will not likely be very useful. Keep in mind that machine learning is not magic. It cannot miraculously turn data into supreme knowledge.

The data being fed into machine learning algorithms needs to be pre-processed in various fashions.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

So What Can You Actually Do With Data Visualization?

6 Jun, 2016

We have all seen data visualization grow in stature over the past decade and it is now an essential part …

Read more

How Big Data Helps in Finding Out Your Best Customers?

24 May, 2017

For today’s businesses, big data offers numerous possibilities. Big data analysis also offers a number of advantages when it comes …

Read more

A Prescription For Less Chaos In Data Management

26 Sep, 2021

Software developers and data professionals go to work. Software application development engineers, programmers, data scientists, systems architects and database sysadmins, …

Read more

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

Remote (United States (Nationwide))

9 May, 2024

Read More

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.