One would obviously expect Hadoop to dominate the discussions at the recent Strata & Hadoop World conference in San Jose, CA. But much of the buzz this year was around Apache Spark, and how Spark might fit into the data management strategies of many organizations.
Arno Candel, chief architect at H20.ai,shared his observations with Information Management on what conference attendees were most interested in, and how those needs are influencing his company’s go-to-market strategies.
Information Management: What are the most common themes that you heard among conference attendees and how do those themes align with what you expected?
Arno Candel: Many of the people I spoke with were interested in how Spark can, or would, fit into their overall data management and analytics strategy. While we at H2O.ai have been seeing increasing interest in Spark, which was one of the reasons that we built out Sparkling Water, our Spark API, I’ve always thought of Strata as a Hadoop conference - it is after all merged with Hadoop World.
It’s now clear that data storage is essentially a solved problem, while in-memory analytics and machine learning are driving most of the ongoing work in the field. We see ourselves as very much aligned with this trend.
IM: What are the most common data challenges that attendees are facing?
AC: Turning data into actionable insights has been, and remains, a key challenge for many organizations. Everyone has been told that they need to store more and more information in data stores like Hadoop, but there is often a lack of a plan for the “day after.” What do organizations do once they’ve stored all their data in a data lake? They realize that they need some kind of analytics strategy, but aren’t sure exactly what that should look like.
In addition, there is a huge problem with regards to data cleansing; much of the data that organizations have stored is messy, has missing variables, etc. and organizations need to find a way to deal with that.
Chief Analytics Officer Europe
15% off with code 7WDCAO17
Chief Analytics Officer Spring 2017
15% off with code MP15
Big Data and Analytics for Healthcare Philadelphia
$200 off with code DATA200
10% off with code 7WDATASMX
Data Science Congress 2017
20% off with code 7wdata_DSC2017