Predictive analytics and machine learning: A dynamic duo
- by 7wData
Predictive analytics and Machine Learning working separately or together can be just what a company needs to succeed. But understanding how they work is key to figuring out how they can help businesses thrive.
So, what is predictive analytics? Datafloq's Mark van Rijmenam uses the car metaphor, according to which traditional, descriptive analytics is like looking at the rear-view mirror to see what has happened, while predictive analytics is using a navigation system to tell you what will happen, and prescriptive analytics is a self-driving car that knows how to take you to your destination.
This metaphor, while easy to comprehend, may also be deceptively simple. It certainly is open to interpretation, so it's a good starting point for discussion. Some might say that a navigation system presumably has access to all the data regarding potential routes. So is suggesting a route based on that data really a prediction? Isn't that something algorithmic, deterministic, thus not really "intelligent"? Or is this a matter of definitions -- semantics?
It depends on how a navigation system is defined and how it works. Typically, navigation systems do not try to predict where do you want to go today. What they do instead is they wait to get specific instructions and then they figure out how to get from point A (either explicitly given as the starting point or calculated using GPS geo-location) to point B.
Let us examine a different example: Boarding Gate Readers (BGRs). BGRs are able to indicate whether a certain person should be granted access to a certain area of an airport at a certain time. For non-tech people, this is equally mystifying as a navigation system: how does the system "know" what to do, what the right answer/action is?
For techies, both examples are nothing to write home about: there is a database with all the information (streets and distances, passenger lists), there is an algorithm determining the output for the given input (fastest route from A to B, whether passenger X is in the list for flight Y), there is a medium that connects the system with the outside world (GPS position, bar-code reader). In fact, there is no real prediction involved in either system.
When looked at under that lens, these systems may differ in terms of implementation details and complexity of algorithms and data, but they are fundamentally not that far apart. Still, while few people in the tech industry would classify a BGR as a predictive system, presumably some would do so for a navigation system. Is the fact that BGRs respond with a binary (access/no access) answer, while a navigator responds with specific instructions a differentiating factor?
To answer this, let's look at another example: identifying malware. As described by Kaspersky's Alexey Malanov, this used to be possible using rather straightforward algorithms and rules. At some point, the search space (i.e. the number of potential malware to identify) became so big and started expanding so fast that it was very hard to devise rules that would cover it in its entirety and keep up to date. Hence, enter Machine Learning (ML).
Malanov shows how ML can be used to perform the same task -- identifying malware -- more efficiently. The essence of how this works is by using an algorithm implementing heuristic rules based on metrics (in this case, letter sequence frequency) and a curated dataset to train the algorithm. The process is different, there are quite a few gotchas along the way, but the end result is basically the same: the ability to respond to input with a binary answer of malware/not malware.
So, is a navigator all that different? The two examples share some similarities -- they have a big search space and devising algorithms to cover it in its entirety is pretty hard. What Malanov's example shows is how a ML algorithm works as a function that classifies input into binary output.
Â
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
Evolving Your Data Architecture for Trustworthy Generative AI
18 April 2024
5 PM CET – 6 PM CET
Read MoreShift Difficult Problems Left with Graph Analysis on Streaming Data
29 April 2024
12 PM ET – 1 PM ET
Read More