How to Manage the Tension between Data Control and Access

There has always been an uneasy truce within large organizations between those who control access to data – the IT group, usually – and those who need that data to improve business performance. In a perfect world, the IT group would like to see a single source of truth manifested in master data management (MDM) and the enterprise data warehouse (EDW).

Let’s consider MDM. A paper by Wilbram Hazejager of DataFlux Corporation (acquired by SAS in 2000) notes that MDM’s origins go back to the early 2000s. Its proponents did – and still do – see MDM as the way to solve the problem of disparate, disjointed data spread across different lines of business.

Nevertheless, according to Gartner, the majority of MDM initiatives fail. There can be many reasons for this. But one reason is simple: To succeed, MDM demands strict adherence to data-governance policies by everyone in the enterprise, all the time. That’s not very realistic.

But the effort to implement MDM, even if only partially realized, reinforces IT’s role as the gatekeeper of enterprise data. Rapidly growing supplies of data make it all the more difficult to streamline the data supply chain that delivers raw material for analysis to business users. And it puts IT in the unenviable position of trying to deliver more data sets, faster, while the greater enterprise population yearns for data democracy.

Along with MDM, the enterprise data warehouse also represents a legacy approach to handling critical business data. Large and expensive to maintain, the typical EDW fulfills a narrow, often application-specific purpose. Moreover, data architects must use Extract, transform, and load (ETL) tools to add data to an EDW, which consumes substantial time and money. Simply adding a row of data to an EDW could take months.

Escaping the confines of IT’s grip on enterprise data has pushed many a business unit into the netherworld of shadow IT, a term often used to describe information-technology systems and solutions built and used inside organizations without explicit organizational approval. These solutions often leverage the cloud. It doesn’t take much to deploy a Hadoop cluster in the cloud and start filling it with data, more or less on the sly.

This is not to say that most corporate deployments of Hadoop are “off the books.” They are not. In fact, getting off the ETL treadmill has been one of Hadoop’s main selling points for large enterprises. Hadoop stack vendors have focused most of their marketing dollars on the notion that organizations can move some of their EDW data into Hadoop. It’s far cheaper and more flexible in terms of hardware and storage.

These vendors talk about EL – Extract and load – rather than ETL. Extract the data and load it into Hadoop; transform it when necessary for a particular use case. The popularity of Hadoop as a destination for structured as well as unstructured data has spawned several SQL on Hadoop solutions, including MapReduce, Impala, SparkSQL, Presto, and Hive on Tez.

Yet there’s far too much data in EDWs for any company to consider putting all their EDW data in Hadoop. Moving a billion rows of data from an EDW takes time. It also puts a load on the primary business system that depends on the data warehouse, which can impact operations. Likewise, an EDW database can handle only so many requests before performance degrades; plus, these data migrations hog enormous network bandwidth. In other words, it’s not a trivial exercise.

So organizations have a foot in both worlds. If organizations ever move all their EDW data to Hadoop, it will be a multi-year, possibly a multi-decade process. Most knowledge about customers, transactions, and products still lives in EDWs. Right now, most enterprises use Hadoop to hold large data sets like log files or sensor data, which are massive, multi-format, and don’t conform well to the schema of an EDW. They may not be sure what value this data holds, but they want some place to put it until they figure it out.

 

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

Smart Cities Initiatives around the World Are Improving Citizens’ Lives

5 Aug, 2019

It’s a fact of life that most of us are going to be living in cities. According to the United …

Read more

Big Data, Small Details: How Metadata Creates Security Risks

23 Feb, 2018

What happens when you put a photograph online? In most cases, not much happens at all. It simply exists in …

Read more

How data robots will influence the future of work

21 Jul, 2017

It’s a really exciting time to be in the data analytics space. We’re at the start of a whole new …

Read more

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.