Mid-year Updates for Big Data Trends: Apache Kafka, Spark, Flink, Drill and More

Mid-year Updates for Big Data Trends: Apache Kafka

In January, I made predictions about six big data trends for 2016. (“What Will You Do in 2016?”) Now we’ve reached the mid-and-a-bit-more-year so it’s a good time to check them out and see how well these predictions match what has happened so far in 2016, what is surprising about that, and what’s likely to come in the second half of the year.

As a spoiler, I actually predicted Pokémon Go….

“You will come up with some innovative way to put big data to use that has not yet occurred to me.”

Yes, it’s true that I had no idea when I wrote this prediction that people of all ages would be walking around outside with their smartphones, “catching” virtual beasties. I am writing this post less than two weeks after the release of Pokémon Go in Australia, New Zealand and the US. As of 5 days a(Pokémon)go, this product is reported to be the most active mobile game in the US ever and daily uniques have surpassed those of Twitter. So while I did not predict the game itself – or its fantastic initial impact - I did predict that you would surprise me.  And you did.

In addition, I also predicted a huge upsurge in people putting streaming data to use in new ways AND a big presence for telecommunications in the big data arena. Pokémon Go comprises those predictions as well. Looking good so far…

Now let’s revisit each of the predictions in a more serious way.

“There will be explosive interest in streaming data and streaming analytics.”

Yes, yes, yes.  As predicted, there’s a lot of excitement around the topic of streaming data, for both transport and processing. The popular Apache Spark project provides Spark Streaming to handle processing in near real-time through a mostly in-memory micro-batching approach. And as I suggested, there is increasing interest in the Apache Flink project, including outside of Europe where it originated. Flink is a streaming data engine that makes it possible to process data in real-time or in batch mode, with high throughput and fault tolerance guarantees.

I also predicted there would be a rise in awareness of messaging tools with particular capabilities to support efficient streaming architectures. This shift is seen as increasing popularity of the message transport known as Apache Kafka and of the new messaging system called MapR Streams that supports Kafka 0.9 API but is integrated into the MapR converged data platform.  Both have happened, at an even higher level than I would have thought.

Message streams, shown here as horizontal cylinders, are the heart of a streaming architecture. Multiple applications (consumers) can share the streaming data without danger of cross-interference. Here we remind you of four popular data processors, although you would not likely be using them all at once. (image © E.Friedman 2016)

Based on topics discussed at international big data conferences in the spring and early summer, such as the Strata conferences in San Jose and London, the Hadoop Summit conferences in Dublin and San Jose, Spark Summit in San Francisco and the Berlin Buzzwords conference, streaming data is very much the rage. With co-author Ted Dunning, I’ve done a lot of book signings for the O’Reilly publication titled Streaming Architecture, and the people who show up are enthusiastically seeking information about how to design streaming projects and about the technologies that best support them.

The Berlin Buzzwords conference in June, for instance, particularly demonstrated people’s enthusiasm for streaming data, with 17 presentations on stream-related topics, including a keynote on streaming with Kafka style message transport and 9 technical talks on Apache Flink.

“Businesses want practical ways to get to value faster, … you are likely to try out Apache Drill some time in 2016 if your business has any need for SQL.”

Apache Drill has had a good year so far, with substantial advances in the April version 1.6 release and additional improvements in the latest, the June version 1.7 release.

 

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

Deep Diving the Data Lake – Automatically Determining What’s In There

6 Jan, 2021

As your data lake grows larger and your user group more diverse you will need these tools that automatically catalog …

Read more

Conversational AI—A New Wave Of Chat-Enabled Customer Service

27 Jan, 2020

Conversational AI is estimated to grow into a $15.7 billion market by 2024. However, with this incredible growth comes challenges …

Read more

Can IT keep up with big data?

31 May, 2016

Though IT and its functions and responsibilities have changed over the years, there’s one area that remains consistent: IT primarily …

Read more

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.