Data Drives the Evolution of Movement Technology

Data Drives the Evolution of Movement Technology

We hear a lot today about streaming data, fast data, and data in motion. It’s as if until now data has been stagnant, just sitting in some dusty database and never moving. The truth is that we have always needed ways to move data.

Historically, the industry has been pretty inventive about getting this done. From the early days of data warehousing and extract, transform, and load (ETL) to today’s real-time streaming ingest systems, we have continued to adapt and create new ways to move data sensibly, even as its appearance and motion patterns have dramatically changed.

In our new data driven world, the practice of exerting firm control over our data in motion is an increasingly critical competency and is becoming core to successful business operations. Based on more than 20 years in enterprise data, here is how I see where we have been and where we are going as we evolve into a world in which the full force of data volume, velocity, and variability takes hold.

First Generation: Stocking the Warehouse via ETL

Let’s roll back a couple of decades. The first substantial data-movement problems that plagued the mid-1990s emerged with the trend toward data warehousing. The goal was to move transaction data provided by disparate applications or residing in databases into the newly minted data warehouse. Organizations operated a variety of applications, such as offerings from SAP, Peoplesoft, and Siebel, and a variety of database technologies like Oracle and IBM. As a result, there was no simple way to move the data; each was a bespoke project requiring an understanding of vendor-specific schemas and languages. The inability to “stock the warehouse” efficiently led to data warehouse projects failing or becoming excessively expensive.

ETL tools addressed this initial data-movement problem by creating connectors for applications and databases to load the warehouse. For each source, one needed only to specify the fields and map them into the warehouse. The engine did the rest of the work. I refer to this first generation as schema-driven ETL. It was developer-centric, focused on preprocessing (aggregating and blending) data at scale from multiple sources to get it uniformly into a warehouse for business intelligence (BI) consumption. Large companies spent millions of dollars on these first-generation tools that allowed developers to move data without dealing with the myriad languages of custom applications.

This first generation became a multi-billion dollar industry.

Second Generation: Less Cloudy Skies via iPaaS

Over time, consolidation in the database and application world created a more homogeneous, standards-based world. Organizations began to wonder if ETL was even necessary, now that the new world order had done away with the fragmentation that has spawned its existence, and a small number of database/application mega-vendors remained.

But a new challenge replaced the old. By the mid-2000s, the emergence of SaaS apps added another layer of complexity to the data-movement challenge. The new questions were: “How do we get cloud-based transaction data into warehouses? How do we synchronize information between different cloud applications? Should we deploy integration middleware in the cloud or on-premise or both?”

As the SaaS delivery model proliferated, customer, product, and other domain data became fragmented across dozens of different applications with inconsistent, overlapping, or redundant data structures. Because cloud applications are API-driven rather than language-driven, organizations had to rationalize across the different flavors of APIs needed to send data between these various locations.

The cloud forced data-movement technologies to evolve from analytic data integration, the sweet spot for data warehouses, to operational integration, featuring data movement between applications that increased the pressure on the system to deliver trustworthy data quickly.

 

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

Rise of the ML Engineer

30 May, 2020

The job title “ML Engineer” is quickly outpacing “Data Scientist” in the new decade. Here are five reasons why you …

Read more

Cloudera founder Mike Olson: ‘We’re moving from automating processes to automating decisions’

16 Mar, 2018

The growing momentum of big data in the cloud has been described as a threat to Cloudera Inc., which is …

Read more

Pig vs Hive vs SQL – Difference between the Big Data Tools

31 Oct, 2017

Hadoop is the hot new technology and SQL is the old, tried and tested tool for diving deep into big …

Read more

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.