Doug Cutting Reflects on Hadoop’s Impact, Future
- by 7wData
The first Hadoop cluster went into production in January 2006 and spent the next decade ushering in the era of big data analytics, fundamentally changing every industry and organization that values knowledge and insight. It quickly attracted an active and devoted community of open-source contributors, and even large enterprises turned down their noses and took notice.
In a conversation with Data Informed, Hadoop co-creator Doug Cutting, now Chief Architect at Cloudera, recalled the origins and pondered the future of the transformational data processing platform named for an elephant that lives in his sock drawer.
Data Informed: Could you have imagined a decade ago that Hadoop would become as widely adopted and important to industry as it is?
Doug Cutting: No, it wasn’t on my mind at all a decade ago. What was on my mind then was trying to build an open-source project that would survive. My goal was to work on software that would get used and keep being used, ideally. I learned through [the Apache] Lucene [project] that open-source was a great way to do this. It almost gave you an unfair advantage toward adoption. People would adopt it very readily and use the heck out of it because they didn’t have to pay anything, and they could even help fix it when they had problems.
The software we had in [the Apache] Nutch [project] wasn’t to the point that anyone could pick it up, easily use it, and see the value. It was pretty raw stuff and it needed help. And that was why I joined Yahoo 10 years ago, renamed those core components that needed this work Hadoop, and tried to build a community around that and build the robustness of the software so that it would attract a community. It’s a bit of a chicken-and-egg problem there, that Yahoo helped us solve in 2006, 2007, where really made it robust enough that other folks could get involved
By 2008, 2009, we clearly had something that was going to succeed, was helping people, and was going to be a project that had a life of its own. And that’s as far as I had imagined 10 years ago. So at that point, we succeeded.
What are your thoughts about what this platform has become? What surprises you most about it?
Cutting: I think the surprising part is the cultural part. There was a culture of enterprise software, and people would only trust things that came from very establishment companies, the IBM database, the Oracle database, the Microsoft database. And I had always worked on this flaky, fringy software that didn’t have anything to do with that tradition. For the things I worked on, we didn’t use relational database software, nor did we expect people in the enterprise ever to use the software we worked on. To me, the biggest surprise is that, to a large degree, those two communities have merged. Big banks, insurance companies, railways, and retailers now accept that open source is a valuable source for technology and they are willing to bring it in-house. And the open-source community is now respecting big enterprises as a valuable destination and collaborating with these folks and helping deliver products that meet their needs, taking security and reliability much more seriously. That change I didn’t see coming, that these two communities of software development have come to accept one another and are now, to some degree, merged.
Where do you think Hadoop is in its lifecycle? Is it mature or do you still see a high ceiling in terms of usefulness and potential?
Cutting: It’s hard to declare that definitively. My standard answer is, “It’s an adolescent.” But, more realistically, as long as software is evolving, it’s alive. If software stops evolving, it’s dead. It’s now – what’s the polite word we use for old, dead software? Legacy. And I think it’s a long way from being legacy, although there are components of Hadoop that are already becoming legacy. MapReduce is on its way to being legacy. We’ve got Spark, and new things that are replacing it.;
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
Evolving Your Data Architecture for Trustworthy Generative AI
18 April 2024
5 PM CET – 6 PM CET
Read MoreShift Difficult Problems Left with Graph Analysis on Streaming Data
29 April 2024
12 PM ET – 1 PM ET
Read More