Collecting, understanding and utilising big data is a fundamental requirement for businesses wishing to remain competitive. Those organisations that master big data will be kings and queens of their respective industry, by improving operational efficiency and customer experience.
Often businesses spend 95% of their time looking for the relevant data and only 5% of the time using it. This is neither efficient nor productive, and could quite conceivably lead to an organisation’s fall, or at the very least stagnation.
Hadoop for a long time was seen as the most effective solution software for tackling big data. Its function: to store and process big data in an easy, simple and fast manner. But is it the case any more?
As more and more businesses become reliant on big data to thrive and survive, software like Hadoop will become an increasingly valued commodity.
In its most unflattering form, Hadoop is defined on searchcloudcomputing.com as an open source, Java-based programming framework that supports the processing and storage of extremely large data sets in a distributed computing environment. It is part of the Apache project managed by the Apache Software Foundation.
This is an accurate and fair description, but it is hardly a love sonnet. Businesses that operate Hadoop will perhaps lean towards a more romantic view of the software that has made their business into a more efficient beast.
‘Hadoop is a unique architecture designed to enable organisations to gain new analytic insights and operational efficiencies,’ said Carole Murphy, product director for data security at HPE Security . ‘The resulting flexibility, performance and scalability are unprecedented.’
The platform has significant business benefits in storing and processing big data through, as Murphy reveals, ‘the use of multiple standard, low-cost, high-speed, parallel processing nodes operating on very large sets of data’.
Storing data is the first function Hadoop offers, as Forrester analyst Mike Gualtieri explains: ‘Hadoop lets you store files that are bigger than what can be stored on one particular node or server. So you can store very, very large files. It also lets you store many, many files.’
It allows a business to store data that was previously too expensive to keep. MapReduce is the second function of Hadoop, and processes the data or provides a framework to process the data. It is here where Hadoop excels.
Moving data over a network can be painfully slow. MapReduce, operating within Hadoop, provides the solution to this painstaking process by moving the processing software to the data – it operates from the inside.
In terms of how Hadoop works technically, Apache Software Foundation describes it as a framework that is designed to scale up from single servers to thousands of machines, each offering local computation and storage.
Rather than relying on hardware to deliver high availability, the library itself is designed to detect and handle failures at the application layer, thereby delivering a highly available service on top of a cluster of computers, each of which may be prone to failures.
Again, this is not the most romantic or coherent – for those with a non-scientific background – description.
What is important, some would suggest, is not how it works, but what benefits it creates for businesses seeking to tap into big data.
Gaining the advantage
So, how important is tapping into the proverbial big data gold mine for businesses?
‘Imperative, if you will,’ Jules Damji, spark community evangelist at Databricks , told Information Age.
‘Data is everywhere, and it’s growing in velocity, volume and variety, in all sectors of business and in all industry verticals. Big data is the new competitive advantage, just as automation in manufacturing enables production at scale or just as IT innovation enables productivity at scale.