Back in 1944 Wesleyan University Librarian Fremont Rider wrote a paper which estimated American university libraries were doubling in size every sixteen years meaning the Yale Library in 2040 would occupy over 6,000 miles of shelves. This is not big data as most people would know it, but the vast and violent increase in the quantity and variety of information in the Yale library is the same principle.
The concept was not known as big data back then, but technologists today are also facing a challenge on how to handle such a vast amount of information. Not necessarily on how to store it, but how to make use of it. The promise of big data, and data analytics more generically, is to provide intelligence, insight and predictability but only now are we getting to a stage where technology is advanced enough to capitalise on the vast amount of information which we have available to us.
Back in 2003 Google wrote a paper on its MapReduce and Google File System which has generally been attributed to the beginning of the Apache Hadoop platform. At this point, few people could anticipate the explosion of technology which we’ve witnessed, Cloudera Chairman and CSO Mike Olson is one of these people, but he is also leading a company which has been regularly attributed as one of the go-to organizations for the Apache Hadoop platform.
“We’re seeing innovation in CPUs, in optical networking all the way to the chip, in solid state, highly affordable, high performance memory systems, we’re seeing dramatic changes in storage capabilities generally. Those changes are going to force us to adapt the software and change the way it operates,” said Olson, speaking at the Strata + Hadoop event in London. “Apache Hadoop has come a long way in 10 years; the road in front of it is exciting but is going to require an awful lot of work.”
Analytics was previously seen as an opportunity for companies to look back at its performance over a defined period, and develop lessons for employees on how future performance can be improved. Today the application of advanced analytics is improvements in real-time performance.