An inside look at why Apache Kafka adoption is exploding

An inside look at why Apache Kafka adoption is exploding

An inside look at why Apache Kafka adoption is exploding

Apache Kafka, the open source distributed streaming platform, is making an increasingly vocal claim for stream data "world domination" (to coin Linus Torvald's whimsical initial modest goals with Linux).Last summer I wrote about Kafka and the company behind its enterprise rise, Confluent. Kafka adoption was accelerating as the central platform for managing streaming data in organizations, with production deployments of Kafka claiming six of the top 10 travel companies, seven of the top 10 global banks, eight of the top 10 insurance companies, and nine of the top 10 US telecom companies.

Today, it's used in production by more than a third of the Fortune 500.

But 2016 may be most noted for Kafka joining the "Four Commas Club," a nod to the popular HBO comedy series "Silicon Valley" where character Russ Hanneman is the flashy and obnoxious billionaire investor who is quick to point out to the tormented heroes of the show that it takes three commas to make the number 1,000,000,000. Last year Linkedin, Microsoft, and Netflix all passed the threshold of processing more than one TRILLION messages a day over Kafka. That's four commas: 1,000,000,000,000. That's scale.

Read Also:
10 Tips For Using Business Intelligence Tools To Maximize ROI In Retail

I asked Confluent CTO and co-founder Neha Narkhede what was behind these numbers.

TechRepublic: Kafka is putting up very large numbers, even for the big data world. I think a lot of people saw the technology as a messaging queue that scaled, kind of a scale-out enterprise bus that moved data very fast. Something more seems to be happening here.

Narkhede: My co-founders and I originally created Kafka at Linkedin in 2010 when our own systems ran up against the limits of a monolithic, centralized database. We saw the need for a distributed architecture with microservices that we could scale quickly and robustly. The legacy systems couldn't help us anymore. On one hand, the traditional messaging queues were real-time but didn't scale and, on the other, the ETL systems couldn't handle data in real-time.

We looked deeply into the architecture of existing systems, why it didn't work and combined that with our experience in modern distributed systems to create Kafka. It was built to be real-time, could store data to feed batch systems from the same pipe, and could enable stream processing to make sense of the data in real-time in addition to moving it around. We had the vision of building the entire company's business logic as stream processors that express transformations on streams of data.

Read Also:
How HR Departments Can Obtain and Use Big Data

In order to do that, you need a highly efficient pipe to move data around, need connectors to existing systems, and need a stream processing layer. That is what we call a complete streaming platform. So though Kafka started off as a very scalable messaging system, it grew to complete our vision of being a distributed streaming platform.

TechRepublic: Are there many enterprises doing real-time stream processing at scale today? A lot of the Fortune 500 is still wondering how to monetize all the data they sucked into their Hadoop clusters in the first place.

Narkhede: It's true that most of the sophisticated backend data processing in enterprises is actually conducted by big batch processes that run on big daily data dumps (the Hadoop people rely on Kafka as the preferred data pipeline to their Hadoop clusters, by the way).



Data Innovation Summit 2017

30
Mar
2017
Data Innovation Summit 2017

30% off with code 7wData

Read Also:
Big Government Is Getting In The Way Of Big Data
Read Also:
Investigating the Potential of Data Preparation

Big Data Innovation Summit London

30
Mar
2017
Big Data Innovation Summit London

$200 off with code DATA200

Read Also:
How HR Departments Can Obtain and Use Big Data

Enterprise Data World 2017

2
Apr
2017
Enterprise Data World 2017

$200 off with code 7WDATA

Read Also:
Making a Difference with Advanced Analytics and Business Intelligence

Data Visualisation Summit San Francisco

19
Apr
2017
Data Visualisation Summit San Francisco

$200 off with code DATA200

Read Also:
The benefits of graph databases in relation to supply chain transparency

Chief Analytics Officer Europe

25
Apr
2017
Chief Analytics Officer Europe

15% off with code 7WDCAO17

Read Also:
The Reality for Smart Businesses is Big Data

Leave a Reply

Your email address will not be published. Required fields are marked *