Why Integration and Governance Are Critical for Data Lake Success

Why Integration and Governance Are Critical for Data Lake Success

Why Integration and Governance Are Critical for Data Lake Success

This is the final article in a three-part series exploring what it takes to build a data lake capable of meeting all the requirements of a truly enterprise-scale data management platform. While earlier installments focused on enterprise-scale data management in Hadoop, data onboarding into the data lake, and security, this article will focus on two things: Integrating the data lake within the broader enterprise IT landscape, and data governance.

As more lakes are deployed, we see patterns emerge for how data lakes are positioned relative to existing databases, data warehouses, analytic appliances, and enterprise applications in larger organizations.

Some data lakes are deployed from the outset as centralized system-of record data platforms, serving other systems in an enterprise scale, data-as-a-service model. As a centralized data lake builds momentum, collecting more data and attracting more use cases and users, its value grows as users collaborate on improving and reusing the data.

Other projects start at the edge of the organization to deliver data and meet the analytic needs of a specific business group. A localized data lake often expands to support multiple teams or spawn additional separate data lake instances to support other groups who want the same improved data access as the first group got.

Read Also:
5 Signs Your Healthcare Organization Needs Data Governance

Regardless of what pattern the data lake takes as it lands and expands in the organization, the data lake’s increasing role in the organization brings with it new requirements for enterprise readiness.

To be enterprise-ready, the data lake needs to support a set of capabilities that allow it to be integrated within the company’s overall data management strategy and IT applications and data flow landscape.

Here are some requirements to keep in mind:

In addition to streaming the integration of your data lake, you must prepare the lake to support a broad and expanding community of business users.

As more users begin working with the data lake directly or through downstream applications or reporting/analytic systems, the importance of having strong data governance grows. This topic — data governance — is the final dimension of enterprise readiness.

By bringing together typically hundreds of diverse data sets in a large repository and giving users unprecedented direct access to that data, data lakes create new governance challenges and opportunities.

Read Also:
How to Represent Data with Intelligent Use of the Coordinate System

The challenges have to do with ensuring that data governance policies and procedures exist and are enforced in the lake.  Enterprise-ready data governance in the data lake starts with a clear definition of who owns or has custodial responsibility for each data asset as it enters the lake and as it is maintained and enhanced through the data lake process.

 



Chief Analytics Officer Spring 2017

2
May
2017
Chief Analytics Officer Spring 2017

15% off with code MP15

Read Also:
With Big Data, Asking Right Questions Is Key

Big Data and Analytics for Healthcare Philadelphia

17
May
2017
Big Data and Analytics for Healthcare Philadelphia

$200 off with code DATA200

Read Also:
Data Governance interview

SMX London

23
May
2017
SMX London

10% off with code 7WDATASMX

Read Also:
Is Your Company Using Employee Data Ethically?

Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
Deep Learning Obstacles: What's the Lesson?

AI Paris

6
Jun
2017
AI Paris

20% off with code AIP17-7WDATA-20

Read Also:
Soft Skills are Vital for Data Governance Success
Read Also:
10 BI Mistakes That Could Be Killing Your Analytics Projects

Leave a Reply

Your email address will not be published. Required fields are marked *