In January, I made predictions about six big data trends for 2016. (“What Will You Do in 2016?”) Now we’ve reached the mid-and-a-bit-more-year so it’s a good time to check them out and see how well these predictions match what has happened so far in 2016, what is surprising about that, and what’s likely to come in the second half of the year.
As a spoiler, I actually predicted Pokémon Go….
“You will come up with some innovative way to put big data to use that has not yet occurred to me.”
Yes, it’s true that I had no idea when I wrote this prediction that people of all ages would be walking around outside with their smartphones, “catching” virtual beasties. I am writing this post less than two weeks after the release of Pokémon Go in Australia, New Zealand and the US. As of 5 days a(Pokémon)go, this product is reported to be the most active mobile game in the US ever and daily uniques have surpassed those of Twitter. So while I did not predict the game itself – or its fantastic initial impact - I did predict that you would surprise me. And you did.
In addition, I also predicted a huge upsurge in people putting streaming data to use in new ways AND a big presence for telecommunications in the big data arena. Pokémon Go comprises those predictions as well. Looking good so far…
Now let’s revisit each of the predictions in a more serious way.
“There will be explosive interest in streaming data and streaming analytics.”
Yes, yes, yes. As predicted, there’s a lot of excitement around the topic of streaming data, for both transport and processing. The popular Apache Spark project provides Spark Streaming to handle processing in near real-time through a mostly in-memory micro-batching approach. And as I suggested, there is increasing interest in the Apache Flink project, including outside of Europe where it originated. Flink is a streaming data engine that makes it possible to process data in real-time or in batch mode, with high throughput and fault tolerance guarantees.
I also predicted there would be a rise in awareness of messaging tools with particular capabilities to support efficient streaming architectures. This shift is seen as increasing popularity of the message transport known as Apache Kafka and of the new messaging system called MapR Streams that supports Kafka 0.9 API but is integrated into the MapR converged data platform. Both have happened, at an even higher level than I would have thought.
Message streams, shown here as horizontal cylinders, are the heart of a streaming architecture. Multiple applications (consumers) can share the streaming data without danger of cross-interference. Here we remind you of four popular data processors, although you would not likely be using them all at once. (image © E.Friedman 2016)
Based on topics discussed at international big data conferences in the spring and early summer, such as the Strata conferences in San Jose and London, the Hadoop Summit conferences in Dublin and San Jose, Spark Summit in San Francisco and the Berlin Buzzwords conference, streaming data is very much the rage. With co-author Ted Dunning, I’ve done a lot of book signings for the O’Reilly publication titled Streaming Architecture, and the people who show up are enthusiastically seeking information about how to design streaming projects and about the technologies that best support them.
The Berlin Buzzwords conference in June, for instance, particularly demonstrated people’s enthusiasm for streaming data, with 17 presentations on stream-related topics, including a keynote on streaming with Kafka style message transport and 9 technical talks on Apache Flink.
“Businesses want practical ways to get to value faster, … you are likely to try out Apache Drill some time in 2016 if your business has any need for SQL.”
Apache Drill has had a good year so far, with substantial advances in the April version 1.6 release and additional improvements in the latest, the June version 1.7 release.