Eckerson is founder and principal consultant at Eckerson Group, a research and consulting firm that helps business leaders use data and technology to drive better insights and actions.
Latest posts by Wayne W. Eckerson (see all)
- The Modern Analytic Platform - December 3, 2016
- Ten “Design First” Principles from Strata/Hadoop World NYC - November 9, 2016
Several themes emerged from my conversations with more than 35 vendors at the Strata/Hadoop World show in New York City this week. (See list of vendors I visited at the end.) Topping the list: automated machine learning services, simplified streaming platforms, end-to-end data lake management software, and powerful, low-cost data and hardware infrastructure. More importantly, I spotted the emergence of ten “design first” principles that will guide the development of data-driven applications in the future.
If you want to see and feel the pace of innovation in the analytics field, just spend a few hours at a Strata+Hadoop World conference.
I’ve been covering business intelligence, analytics, and data management for 20+ years and Strata+Hadoop World made me feel like a novice. Of the more than 100 exhibitors on the show floor, I only recognized about two-thirds. Where did all the new vendors come from? Whose needs are they serving?
For that matter, I hardly recognized many established vendors. The oldest ones are pivoting abruptly, embracing open source, cloud, streaming, subscription pricing, and freemium business models. Most speak a language of Apache projects that I hardly understand or can keep up with. It’s all quite dizzying.
Themes and Market Segments
Before I mention the ten “design-first” principles that emerged from the show, let me categories in more details the types of capabilities that vendors are delivering:
- Simplified streaming and stream-based analytic processing and alerting for the real-time enterprise.
- Automated creation of machine learning models that eliminate the need for data scientists or increase their productivity significantly.
- End-to-end machine learning development and operational environments that make predictive analytics more accessible to non-data scientists.
- Rent-a-data scientist services via auction or competitions (Kaggle and Experfy).
- Software that simplifies and manages the population of data lakes with trustworthy, governed data that is easily accessible to business users.
- Software or infrastructure that reduces the complexity of big data (i.e. Lambda) architectures
- Hybrid transaction-analytic processing databases (HTAP) that deliver fast queries against real-time data.
- Low-cost, fast, scalable databases and hardware designed for large volumes of streaming data.
In my years covering the space, I’ve discovered that the vendor community is usually about five to seven years ahead of the early majority market. But given the velocity of change in the technology and the eagerness of companies to reduce their IT costs with faster, better, cheaper tools, platforms, and infrastructure, I’m going to halve that number.
It’s pretty clear that there is a revolution going in the analytics space. There is a lot of dust and debris flying everywhere but the silhouette of the future is gradually emerging. The development of data-driven applications is starting to adhere to a number of key design principles.
Ten Design-First Principles
In the next several years, organizations will begin designing analytic environments with the following “design first” principles. Design first for….
- Real-time. Even if you need batch applications, build them on a streaming, event-driven infrastructure. It’s fast, cheap, and flexible.
- Prediction. Build analytic models into all business applications, creating a proactive data-driven enterprise that monetizes its data assets.
- APIs. Build applications using microservices and integrate them via standard application programming interfaces, creating highly flexible, extensible applications supported by a community of developers.
- Platform. With API-based applications, your environment is ultimately flexible. It can integrate with or support a multiplicity of internal and third party applications or be embedded in other applications to create high-value, customized data-driven applications and ecosystems.
- Multiple Engines. Rather than force one engine to support many diverse workloads, run each workload on an optimal engine.
- Stationary Data. Once you ingest data, never move it. Query or process data where it lies, using query federation to unify disparate data on the fly or push-down optimization to match workloads with embedded engines.
- Multiple Analytic Tools. Standardize where it counts—on flexible semantic models—rather than toolsets. But where possible, choose tools with open APIs that don’t replicate data.
- Cloud. Design your application for the cloud and hybrid data processing.
- Web. This is an oldy-but-goody that is now just a given: never use a desktop client.
- Mobile. Another oldy-but-goody: design your application for mobile delivery using a responsive design.
This is an off-the-cuff list of design-first principles. I’m sure there are more. If you think you can add to this list, please let me know. Or meet me at the next Strata+Hadoop event to help me scour the show floor new and upcoming vendors, technologies, and design-first principles!
I met with senior representatives from the following 36 vendors at Strata NYC. Most I met for 30-60 minutes, some for a brief 5 to 10-minute chat. This is a pretty big list. (Who said industry analysts don’t earn their keep?!) The irony is that there were many, many more vendors I wanted to talk with but didn’t have time during my two days at the event.
Arcadia Software – OLAP on Hadoop software
Attaccama – MDM software
Altiscale – Big data as a service, just purchased by SAP
Anodot – Online anomaly detection service for streaming time-series data
Atscale – OLAP on Hadoop software
Attunity – Data replication, DW automation, change data capture
Basho – Scale-out, high availability operational database
Bitwise – Graphical ETL design tool for Hadoop
Business Analytics Collaborative – New online community focused on analytics
Cambridge Semantics – Graph-based analytics tool
Confluent – Commercial providers of Kafka messaging service
Data Artisans – Commercial providers of Flink stream processing software
Dataiku – End-to-end, Web-based machine learning platform that runs on Hadoop
Dataguise – Data masking and encryption software for Hadoop and other platforms
DataFactZ – Analytics consultancy
DataRobot – Online data science service that automatically generates highly accurate analytic models from customer data sets
iguazio – Unified, interactive data repository for any big data engine
Infomatica – Data lake management software with collaboration across multiple user roles
Kinetica – Low-cost, extremely fast, scale-out, columnar in-memory database that provides high-compute power via GPU chips on streaming data.
Kognito – In-memory MPP database that makes Tableau truly interactive and now runs for free on Hadoop.
Logtrust – Analytical tool for log and text data; a more modern Splunk
Magnitude Software – Venture-backed roll up providing packaged analytic applications and connectors.
MemSQL – Hybrid Transactional Analytical Processing (HTAP) database
Nvidia – High-performance GPU-based servers (used by Kinetica and others)
Paxata – Hadoop-based, stand-alone data preparation tool
Podium Data – Data lake management software
Semantify – Search-based analytics tool with natural language queries
SnapLogic – Cloud-based application and data integration software
Splice Machine – Hybrid Transactional Analytical Processing (HTAP) database
StreamAnalytix – Stream processing software
Talend – Big data integration software
Teradata – Data warehousing database and analytic software (Aster Data)
The Linux Foundation – Manages the Open Data Processing Institute
VoltDB – Hybrid Transactional Analytical Processing (HTAP) database
Zaloni – Data lake management software
Zoomdata – Real-time analytics platform for creating custom analytic applications