Big data has already made fundamental changes to the way businesses operate. There are huge advantages for companies who can derive value from their data, but these opportunities come with challenges, too. For some, this is the challenge of acquiring data from new sources. For others, it is the task of building a scalable infrastructure that can manage the data in aggregate. For a brave few, it means extracting value from the data by implementing advanced analytic techniques and tools.
For cloud service providers (CSPs) whose business depends upon solving these challenges, the scale of online user-generated data inspired the development of a radically different hardware for datacenter infrastructure and a new kind of software for orchestrating workloads intelligently and efficiently on that infrastructure. When these cloud computing technologies – designed to increase datacenter automation -- were released to the open source community, they spawned projects such as Docker*, Kubernetes* and Mesos*. At the same time, CSPs developed data storage and processing software that could handle the speed and scale of human-generated data. Apache Hadoop* and Spark* are children of these Big Data technologies. The concurrent rise of Data Science as a profession stems from the acute need to detect signal in the noise of this ever-increasing flood of data.
We now face a wave of data generation that is several orders of magnitude greater than the cumulative tracks of surfers, shoppers, and their social networks. We look with awe upon the data generated by smart phones, driverless cars, industrial drones, cube satellites, smart meters, surveillance cameras, and millions of other things that now populate the Internet. And after the shock subsides, we realize that the level of automation that allowed us to manage data in the cloud era must now scale several-fold in order to analyze data in the era of the Internet of Things (IoT).
Automation is the key to solving the challenge posed by IoT. We need things to get smarter when responding to their environment and users. We need systems to become more intelligent based on the history of their interactions. We need technologies and tools that can help these devices and systems learn from their experience. Where once we asked analysts to generate “insights” from data and make decisions that drove changes in system operation, now we must ask systems to learn from data automatically and respond appropriately. In short, we need Machine Learning to make IoT usable.
Machine Learning – the study, development, and application of algorithms that improve their performance of tasks based on prior interactions – is the key to making things that learn from experience and get “smarter” with use.
Consider the example of autonomous vehicles. They construct a model of the world based on data from millions of miles of driving by test cars equipped with sensors such as Radio Detection and Ranging (RADAR), Light Detection and Ranging (LIDAR), and cameras. They use data from maps to plan paths. But they are not programmed explicitly with rules for every scenario they might encounter in the real world. For cars to operate autonomously, they must be trained, much like human student drivers, to recognize objects in the visual field such as other vehicles, highway signs, lane markers, trees, and pedestrians. They must learn to navigate and control the movement of the vehicle in response to dynamic conditions. And much like a student driver, they learn by making mistakes and improving their accuracy with practice. At first, a trainer – the data scientist – annotates the training data to label correct responses and supervises the learning process of the algorithms that make up the model. But eventually, the model learns to recognize the objects, localize them in space, and track their movement well enough to operate in the real world.