Doug Cutting Reflects on Hadoop’s Impact, Future

The first Hadoop cluster went into production in January 2006 and spent the next decade ushering in the era of big data analytics, fundamentally changing every industry and organization that values knowledge and insight. It quickly attracted an active and devoted community of open-source contributors, and even large enterprises turned down their noses and took notice.

In a conversation with Data Informed, Hadoop co-creator Doug Cutting, now Chief Architect at Cloudera, recalled the origins and pondered the future of the transformational data processing platform named for an elephant that lives in his sock drawer.

Data Informed: Could you have imagined a decade ago that Hadoop would become as widely adopted and important to industry as it is?

Doug Cutting: No, it wasn’t on my mind at all a decade ago. What was on my mind then was trying to build an open-source project that would survive. My goal was to work on software that would get used and keep being used, ideally. I learned through [the Apache] Lucene [project] that open-source was a great way to do this. It almost gave you an unfair advantage toward adoption. People would adopt it very readily and use the heck out of it because they didn’t have to pay anything, and they could even help fix it when they had problems.

The software we had in [the Apache] Nutch [project] wasn’t to the point that anyone could pick it up, easily use it, and see the value. It was pretty raw stuff and it needed help. And that was why I joined Yahoo 10 years ago, renamed those core components that needed this work Hadoop, and tried to build a community around that and build the robustness of the software so that it would attract a community. It’s a bit of a chicken-and-egg problem there, that Yahoo helped us solve in 2006, 2007, where really made it robust enough that other folks could get involved

By 2008, 2009, we clearly had something that was going to succeed, was helping people, and was going to be a project that had a life of its own. And that’s as far as I had imagined 10 years ago. So at that point, we succeeded.

What are your thoughts about what this platform has become? What surprises you most about it?

Cutting: I think the surprising part is the cultural part. There was a culture of enterprise software, and people would only trust things that came from very establishment companies, the IBM database, the Oracle database, the Microsoft database. And I had always worked on this flaky, fringy software that didn’t have anything to do with that tradition. For the things I worked on, we didn’t use relational database software, nor did we expect people in the enterprise ever to use the software we worked on. To me, the biggest surprise is that, to a large degree, those two communities have merged. Big banks, insurance companies, railways, and retailers now accept that open source is a valuable source for technology and they are willing to bring it in-house. And the open-source community is now respecting big enterprises as a valuable destination and collaborating with these folks and helping deliver products that meet their needs, taking security and reliability much more seriously. That change I didn’t see coming, that these two communities of software development have come to accept one another and are now, to some degree, merged.

Where do you think Hadoop is in its lifecycle? Is it mature or do you still see a high ceiling in terms of usefulness and potential?

Cutting: It’s hard to declare that definitively. My standard answer is, “It’s an adolescent.” But, more realistically, as long as software is evolving, it’s alive. If software stops evolving, it’s dead. It’s now – what’s the polite word we use for old, dead software? Legacy. And I think it’s a long way from being legacy, although there are components of Hadoop that are already becoming legacy. MapReduce is on its way to being legacy. We’ve got Spark, and new things that are replacing it.;

 

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

Why an integrated analytics platform is the right choice

30 Jun, 2020

Companies realize that in order to grow, connect products and services, or protect their business, they need to become data-driven. …

Read more

Top 9 Data Science Use Cases in Banking

30 Aug, 2018

Using data science in the banking industry is more than a trend, it has become a necessity to keep up …

Read more

The essential check list for effective data democratization

26 Jan, 2023

Truly data-driven companies see significantly better business outcomes than those that aren’t. According to a recent IDC whitepaper, leaders saw …

Read more

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.