Why Integration and Governance Are Critical for Data Lake Success

Why Integration and Governance Are Critical for Data Lake Success

Why Integration and Governance Are Critical for Data Lake Success

This is the final article in a three-part series exploring what it takes to build a data lake capable of meeting all the requirements of a truly enterprise-scale data management platform. While earlier installments focused on enterprise-scale data management in Hadoop, data onboarding into the data lake, and security, this article will focus on two things: Integrating the data lake within the broader enterprise IT landscape, and data governance.

As more lakes are deployed, we see patterns emerge for how data lakes are positioned relative to existing databases, data warehouses, analytic appliances, and enterprise applications in larger organizations.

Some data lakes are deployed from the outset as centralized system-of record data platforms, serving other systems in an enterprise scale, data-as-a-service model. As a centralized data lake builds momentum, collecting more data and attracting more use cases and users, its value grows as users collaborate on improving and reusing the data.

Other projects start at the edge of the organization to deliver data and meet the analytic needs of a specific business group. A localized data lake often expands to support multiple teams or spawn additional separate data lake instances to support other groups who want the same improved data access as the first group got.

Read Also:
The care and feeding of a data project

Regardless of what pattern the data lake takes as it lands and expands in the organization, the data lake’s increasing role in the organization brings with it new requirements for enterprise readiness.

To be enterprise-ready, the data lake needs to support a set of capabilities that allow it to be integrated within the company’s overall data management strategy and IT applications and data flow landscape.

Here are some requirements to keep in mind:

In addition to streaming the integration of your data lake, you must prepare the lake to support a broad and expanding community of business users.

As more users begin working with the data lake directly or through downstream applications or reporting/analytic systems, the importance of having strong data governance grows. This topic — data governance — is the final dimension of enterprise readiness.

By bringing together typically hundreds of diverse data sets in a large repository and giving users unprecedented direct access to that data, data lakes create new governance challenges and opportunities.

Read Also:
Hortonworks seeks salvation in proprietary software

The challenges have to do with ensuring that data governance policies and procedures exist and are enforced in the lake.  Enterprise-ready data governance in the data lake starts with a clear definition of who owns or has custodial responsibility for each data asset as it enters the lake and as it is maintained and enhanced through the data lake process.

 



Big Data Innovation Summit London

30
Mar
2017
Big Data Innovation Summit London

$200 off with code DATA200

Read Also:
Don’t Be Intimidated By Data Governance

Data Innovation Summit 2017

30
Mar
2017
Data Innovation Summit 2017

30% off with code 7wData

Read Also:
Ask the Data Governance Coach: What is a Data Glossary?

Enterprise Data World 2017

2
Apr
2017
Enterprise Data World 2017

$200 off with code 7WDATA

Read Also:
Data Governance Comes in All Shapes and Sizes

Data Visualisation Summit San Francisco

19
Apr
2017
Data Visualisation Summit San Francisco

$200 off with code DATA200

Read Also:
Google Cloud Platform Adds Machine Learning APIs
Read Also:
Data Governance Comes in All Shapes and Sizes

Chief Analytics Officer Europe

25
Apr
2017
Chief Analytics Officer Europe

15% off with code 7WDCAO17

Read Also:
5 Signs Your Healthcare Organization Needs Data Governance

Leave a Reply

Your email address will not be published. Required fields are marked *