Ten “Design First” Principles from Strata/Hadoop World NYC

Wayne W. Eckerson

principal consultant at Eckerson Group
Eckerson is an internationally recognized thought leader in business intelligence and analytics who thinks critically, writes clearly, and presents persuasively about complex topics. He is a sought-after consultant, noted speaker and bestselling author.

Eckerson is founder and principal consultant at Eckerson Group, a research and consulting firm that helps business leaders use data and technology to drive better insights and actions.

Latest posts by Wayne W. Eckerson (see all)

Ten Design First Principles from Strata/Hadoop World NYC
Several themes emerged from my conversations with more than 35 vendors at the Strata/Hadoop World show in New York City this week. (See list of vendors I visited at the end.) Topping the list: automated machine learning services, simplified streaming platforms, end-to-end data lake management software, and powerful, low-cost data and hardware infrastructure. More importantly, I spotted the emergence of ten “design first” principles that will guide the development of data-driven applications in the future.

If you want to see and feel the pace of innovation in the analytics field, just spend a few hours at a Strata+Hadoop World conference.

I’ve been covering business intelligence, analytics, and data management for 20+ years and Strata+Hadoop World made me feel like a novice. Of the more than 100 exhibitors on the show floor, I only recognized about two-thirds. Where did all the new vendors come from? Whose needs are they serving?

For that matter, I hardly recognized many established vendors. The oldest ones are pivoting abruptly, embracing open source, cloud, streaming, subscription pricing, and freemium business models. Most speak a language of Apache projects that I hardly understand or can keep up with. It’s all quite dizzying.

Themes and Market Segments

Before I mention the ten “design-first” principles that emerged from the show, let me categories in more details the types of capabilities that vendors are delivering:

  1. Simplified streaming and stream-based analytic processing and alerting for the real-time enterprise.
  2. Automated creation of machine learning models that eliminate the need for data scientists or increase their productivity significantly.
  3. End-to-end machine learning development and operational environments that make predictive analytics more accessible to non-data scientists.
  4. Rent-a-data scientist services via auction or competitions (Kaggle and Experfy).
  5. Software that simplifies and manages the population of data lakes with trustworthy, governed data that is easily accessible to business users.
  6. Software or infrastructure that reduces the complexity of big data (i.e. Lambda) architectures
  7. Hybrid transaction-analytic processing databases (HTAP) that deliver fast queries against real-time data.
  8. Low-cost, fast, scalable databases and hardware designed for large volumes of streaming data.
Read Also:
Turning big data into high-class insights

In my years covering the space, I’ve discovered that the vendor community is usually about five to seven years ahead of the early majority market. But given the velocity of change in the technology and the eagerness of companies to reduce their IT costs with faster, better, cheaper tools, platforms, and infrastructure, I’m going to halve that number.

It’s pretty clear that there is a revolution going in the analytics space. There is a lot of dust and debris flying everywhere but the silhouette of the future is gradually emerging. The development of data-driven applications is starting to adhere to a number of key design principles.

Ten Design-First Principles 

 In the next several years, organizations will begin designing analytic environments with the following “design first” principles. Design first for….

  1. Real-time.  Even if you need batch applications, build them on a streaming, event-driven infrastructure. It’s fast, cheap, and flexible.
  2. Prediction. Build analytic models into all business applications, creating a proactive data-driven enterprise that monetizes its data assets.
  3. APIs. Build applications using microservices and integrate them via standard application programming interfaces, creating highly flexible, extensible applications supported by a community of developers.
  4. Platform. With API-based applications, your environment is ultimately flexible. It can integrate with or support a multiplicity of internal and third party applications or be embedded in other applications to create high-value, customized data-driven applications and ecosystems.
  5. Multiple Engines. Rather than force one engine to support many diverse workloads, run each workload on an optimal engine.
  6. Stationary Data. Once you ingest data, never move it. Query or process data where it lies, using query federation to unify disparate data on the fly or push-down optimization to match workloads with embedded engines.
  7. Multiple Analytic Tools. Standardize where it counts—on flexible semantic models—rather than toolsets. But where possible, choose tools with open APIs that don’t replicate data.
  8. Cloud. Design your application for the cloud and hybrid data processing.
  9. Web. This is an oldy-but-goody that is now just a given: never use a desktop client.
  10. Mobile. Another oldy-but-goody: design your application for mobile delivery using a responsive design.
Read Also:
Pachyderm Challenges Hadoop with Containerized Data Lakes

This is an off-the-cuff list of design-first principles. I’m sure there are more. If you think you can add to this list, please let me know. Or meet me at the next Strata+Hadoop event to help me scour the show floor new and upcoming vendors, technologies, and design-first principles!


I met with senior representatives from the following 36 vendors at Strata NYC. Most I met for 30-60 minutes, some for a brief 5 to 10-minute chat. This is a pretty big list. (Who said industry analysts don’t earn their keep?!) The irony is that there were many, many more vendors I wanted to talk with but didn’t have time during my two days at the event.

Arcadia Software – OLAP on Hadoop software

Attaccama – MDM software

Altiscale – Big data as a service, just purchased by SAP

Anodot – Online anomaly detection service for streaming time-series data

Atscale – OLAP on Hadoop software

Attunity – Data replication, DW automation, change data capture

Read Also:
Predictive analytics: What are the challenges and opportunities?

Basho – Scale-out, high availability operational database

Bitwise – Graphical ETL design tool for Hadoop

Business Analytics Collaborative – New online community focused on analytics

Cambridge Semantics – Graph-based analytics tool

Confluent – Commercial providers of Kafka messaging service

Data Artisans – Commercial providers of Flink stream processing software

Dataiku – End-to-end, Web-based machine learning platform that runs on Hadoop

Dataguise – Data masking and encryption software for Hadoop and other platforms

DataFactZ – Analytics consultancy

DataRobot  – Online data science service that automatically generates highly accurate analytic models from customer data sets

iguazio – Unified, interactive data repository for any big data engine

Infomatica – Data lake management software with collaboration across multiple user roles

Kinetica – Low-cost, extremely fast, scale-out, columnar in-memory database that provides high-compute power via GPU chips on streaming data.

Kognito – In-memory MPP database that makes Tableau truly interactive and now runs for free on Hadoop.

Logtrust – Analytical tool for log and text data; a more modern Splunk

Magnitude Software – Venture-backed roll up providing packaged analytic applications and connectors.

MemSQL – Hybrid Transactional Analytical Processing (HTAP) database

Nvidia – High-performance GPU-based servers (used by Kinetica and others)

Paxata – Hadoop-based, stand-alone data preparation tool

Podium Data – Data lake management software

Semantify – Search-based analytics tool with natural language queries

SnapLogic – Cloud-based application and data integration software

Splice Machine – Hybrid Transactional Analytical Processing (HTAP) database

StreamAnalytix – Stream processing software

Talend – Big data integration software

Teradata – Data warehousing database and analytic software (Aster Data)

The Linux Foundation – Manages the Open Data Processing Institute

VoltDB – Hybrid Transactional Analytical Processing (HTAP) database

Zaloni – Data lake management software

Zoomdata – Real-time analytics platform for creating custom analytic applications


Originally posted on…

Leave a Reply

Your email address will not be published. Required fields are marked *