Several top themes emerged from the recent Strata & Hadoop World conference in San Jose, including that of the use of data to improve life itself, from healthcare, to housing, to a dozen social issues in between.
Prakash Nanduri, CEO at Paxata, shared his observations and delight with this theme with Information Management. He also spoke about the full maturity of Hadoop in the market, and the wide-scale use of business intelligence and analytics among organizations today.
Information Management: What are the most common themes that you heard among conference attendees and how do those themes align with what you expected?
Prakash Nanduri: One of the most surprising themes I saw throughout the conference was “Data for good.” Very interesting work is being done to predict Ebola outbreaks, ongoing research about diabetes, mapping of low-income housing to those in need.
It’s always expected that you will attend sessions about banks using data to reduce risk or retail companies using data to find more customers, but it was very encouraging to see so much work being done by data scientists to improve lives.
IM: What are the most common data challenges that attendees are facing?
PN: The market for data analytics continues to grow – there is a BI application or tool for every use case. The only challenge there might be in understanding how these tools differ and whether or not the business needs them all.
On the data management side, it feels like those who have made the move to Hadoop for a modern data collection and storage architecture want to now make all the rich data they have available to their business teams. Their usual approaches for doing that are not keeping up – first, having to take data out of Hadoop and put it into traditional data warehouses defeats the purpose and even if that was economically feasible, the speed and volumes of data just break every traditional process.
People are looking for ways to understand the data without having to build schemas, and they want to shape it, clean it and work with it without creating month-long IT projects. Without a doubt, user-driven data preparation, big data discovery and real-time big data analytics were all hot topics, which signals to me that everyone is trying to figure out how to make better use of their data.
The other interesting thing I noticed is that everyone is struggling with data preparation and big data discovery. I think the perception is that the problems are faced by just business people and that data scientists have magic tools to make this easy, but clearly, they would rather not be doing this work if they could instead be doing high value analysis.
IM: What are the most surprising things that you heard from attendees regarding their data management initiatives?
PN: Hadoop has grown up. I heard fewer discussions about Hadoop as a science project being run by a small team, and more dialog about operational systems that are getting hooked to business initiatives. For example, a forward thinking team who is doing a ton of security analytics off all sensor data being collected in Hadoop. They are establishing baselines they could never do before because they now have two years’ worth of data to help them establish patterns, anomalies and outliers.
The other interesting thing I noted was the maturity of the ecosystem as a whole.