Streaming analytics, machine learning and advanced analytics were among the most talked-about themes at this year’s Strata and Hadoop World conference in New York. Information Management spoke with Jack Norris, senior vice president, data and applications, at MapR Technologies about what organizations are doing in these areas, and where the greatest challenges lie.
Information Management: What are the most common themes that you heard among conference participants?
Jack Norris: There were a few big themes, and an interesting non-theme. Topics like streams, machine learning, and advanced analytics were prevalent throughout the conference.
Many data-driven organizations are looking at event streaming architectures for creating new ways to derive business value from their data. This entails not only advantages around acting in real-time, but also data architectures that enable faster time to value.
A micro services architecture is an example of a development/deployment paradigm that leverages a streaming, or “publish-subscribe framework,” to promote rapid application development and agility. These topics all point to a greater industry focus on data platforms that can enable streams, machine learning, and a wide range of analytics.
This is why this was a particularly interesting conference for us. Organizations are recognizing that the first big challenge they need to address is identifying a platform that can cover them for all their business requirements, not only for today and the near future, but for years to come.
Conversations specifically around Apache Hadoop have dropped significantly since last year. More discussions start with requirements and goals around the use of data, versus a presumption that a particular technology is the starting point for solving business problems. These platform discussions are important, and dimensions such as performance, scalability, and reliability are top of mind.
IM: What are the most common data challenges that attendees are facing?
Norris: The most common data challenges stem from applications dictating how data is organized and stored. Data is prepared into specialized schemas to serve the application. Each application has its own dedicated silo, and the result is that you have a proliferation of silos.
The average company has hundreds of data silos throughout their organization. It’s a major challenge to deal with the many ETL processes, data duplication, and different protection schemes etc. Big Data solutions have increased these siloes which also impact data analytics.