Machine Learning Data for Self-Driving Cars: Shared or Proprietary
- by 7wData
The crux of any machine learning approach involves data. You need lots and lots of usable data to be able to “teach” a machine. One of the reasons that machine learning has progressed lately is due to the advent of Big Data, meaning tons of data that can be readily captured, stored, and processed. Why is there a necessity to have an abundance of data for purposes of doing machine learning? Let’s use a simple but illustrative example to explain this. Imagine if you wanted to learn about birds and someone showed you only one individual picture of a bird (and furthermore, let’s assume you had never seen any birds in your lifetime). It might be difficult to generalize from one picture and discern the actual characteristics of a bird. If you saw perhaps 50 pictures you’d have a greater chance of discovering that birds have wings, they have beaks, etc. If you saw thousands and thousands of pictures of birds you’d be able to really begin to figure out their characteristics, and even be able to classify birds by aspects such as distinctive colors, distinctive wing shapes, and so on.
For self-driving cars, many of the self-driving car makers are utilizing machine learning to imbue their AI systems with an ability to drive a car. What kind of data are the developers using to “teach” the automation to drive a car? The developers are capturing huge amounts of data that arises while a car is being driven, collecting the data from a myriad of sensors on the car. These sensors include cameras that are capturing images and video, Radar devices that capture Radar signals, LIDAR devices that capture laser-based distance points data, and the like. All of this data can be fed into a massive dataset, and then crunched and processed by machine learning algorithms. Indeed, Tesla does this data collection over-the-air from their Tesla cars and can enhance their existing driving algorithms by examining the data and using it to learn new aspects about how their Autopilot software can improve as a driver of the car.
How much data are we talking about?
One estimate by Intel is the following:
Radar data: 10 to 100 KB per second
Camera data: 20 to 40 MB per second
Sonar data: 10 to 100 KB per second
LIDAR: 10 to 70 MB per second
If you add all that up, you get about 4,000 GB per day of data, assuming that a car is being driven about 8 hours per day. As a basis for comparison, it is estimated that the average tech-savvy person uses only about 650 MB per day when you add-up all of the online social media, online video watching, online video chatting, and other such uses on a typical day.
The estimates of the data amounts being collected by self-driving cars varies somewhat by the various analysts and experts that are commenting about the data deluge. For example, it is said that Google Waymo’s self-driving cars are generating about 1 GB every second while on the road, which makes it 60 GB per hour, and thus for 8 hours it would be about 480 GB. Based on how much time the average human driver drives a car annually, it would be around 2 petabytes of data per year if you used the Waymo suggested collection rate of data.
There’s not much point about arguing how much data per se is being collected, and instead we need to focus on the simple and clear cut fact that it is a lot of data. A barrage of data. A torrent of data. And that’s a good thing for this reason – the more data we have, the greater the chances of using it wisely for doing machine learning. Notice that I said we need to use the data wisely. If we just feed all this raw data into just anything that we call “machine learning” the results will not likely be very useful. Keep in mind that machine learning is not magic. It cannot miraculously turn data into supreme knowledge.
The data being fed into machine learning algorithms needs to be pre-processed in various fashions.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
From Text to Value: Pairing Text Analytics and Generative AI
21 May 2024
5 PM CET – 6 PM CET
Read More