We have watched Hadoop grow through two significant phases of maturity, defined by how companies first used it and what tools emerged to create a vibrant ecosystem in the process. Recently, we’ve started seeing another set of changes that clearly indicate that we are entering a third phase of Hadoop maturity – one that is more robust and characterized by new kinds of functionality and accessibility.
In the early days of Hadoop, it was a new tool that a few scattered groups explored for research projects. Users could run MapReduce and HBase, and early tools like Pig and Hive made Hadoop easier to use. During this initial phase, people still thought in terms of “writing jobs” and “will this job complete at all?” rather than in terms of applications, workflows, predictable run times, and operability.
When it became clear that Hadoop has the ability to provide real business value, departments started building workloads for business intelligence and reporting to extract meaningful insights. Suddenly, IT needed to care about things like predictable run times, running different kinds of workloads across a shared infrastructure (for example, running MapReduce and HBase together), efficiency and ROI, disaster recovery, and similar concerns that are typical of “real” IT projects.
Similar to the previous phase, this second phase of Hadoop maturity was mirrored by increasing development of the Hadoop ecosystem as a whole. This is where innovations like YARN, Spark (for lightning-quick streaming and analytics), and Kudu entered the scene.
Today, it’s clear that we are entering a third phase of Hadoop within enterprise environments. In this new phase, Hadoop is accessible to all business units, and we begin to see multi-departmental uses. IT organizations now must serve all of these business units, a solution that many call “Hadoop as a Service” or “Big Data as a Service.”
With this third phase comes a whole new set of requirements for Hadoop that are important to consider. For example, when multiple departments are using shared infrastructure, they demand SLAs – it simply doesn’t work if one group’s use of Hadoop slows down other projects beyond an acceptable limit. As Hadoop demands an increasing amount of a company’s IT capacity, it’s more important than ever that it be efficient.;