Big data flows from all channels in the modern technological world: social, mobile, networks, sales, machines, sensors, markets, etc. In fact, big data flows so abundantly that we choose water-themed metaphors to describe it: data lake, data flood, data tsunami, oceans of data, streaming data, and even theCD sea of data. As we navigate through these deep waters of data, we need to “mind the wheel” — that is, use exploratory data analyticsand advanced data mining methods to navigate our way through the ocean of data from the uncharted seas of bytes, and then onward to the safe shores of analytics success: faster, better, cheaper insights and knowledge discovery.
Let us change the wording of our metaphor “mind the wheel” to “mine the wheel”, specifically to “mine the big data wheel.” With this rewording, the goal of our data analytics activities is now explicitly expressed: to mine the data! Since data mining is KDD (Knowledge Discovery from Data), then our goal is clear. Here are three ways that we can mine the big data wheel:
1. Data are created and emitted in prodigious quantities from large computational models running on high-performance computers. It is almost impossible for us to keep up with these output data streams. So, it is beneficial (perhaps, imperative) to mine the data as they pass from computational processor to data storage device. In other words, data have inertia – it is very difficult to get the data moving again after they have become stationary (on storage media); and conversely, the data have lots of power for knowledge discovery while they are moving through processors. Therefore, mine the data as they are moving, using embedded in-memory analytics algorithms as part of the computational modeling package. As the wheel of data turns within the modeling process, search for significant patterns, new trends, and anomalous behaviors in real time (not after the model has turned cold). In this way, you may also introduce an autonomous fast-response feedback loop into the model, to iterate, zoom in, or otherwise react to interesting emergent features in the massive streaming data outputs. The big data analytics processing capability of a Hadoop cluster in the cloud offers one approach to this “mining the big data wheel” use case.
2. Data will be collected and transmitted from billions and billions of sensors in the coming years as the Internet of Things (IOT) reaches full bloom. The IOT will “sensor” the world. Data will be streaming from ubiquitous devices, people, processes, supply chains, engines, manufacturing lines, networks (social, financial, computer), and so on. It will be essentially impossible to go back later and mine these data for emergent, anomalous, interesting, profitable, or adversarial patterns. So, again it is beneficial and imperative to mine this big data wheel as the IOT sensors are turning and churning out larger quantities of data than we could ever imagine.