Charting the data lake: Model normalization patterns for data lakes Blog

Charting the data lake: Model normalization patterns for data lakes

by 7wData
May 19, 2017

The data lake can be considered the consolidation point for all of the data which is of value for use across different aspects of the enterprise. There is a significant range of the different types of potential data repositories that are likely to be part of a typical data lake. These data repositories are likely to be required to address a number of different roles to address the needs of the different users, for example:

The other key question in building out the data lake repositories is to what level is some standardization or consistency of schema desirable or even necessary. It is a valid choice for an organization to decide that there is no need to enforce any degree of standardization of schema across the data lake. In this case, the expectation is that whatever virtualization layer is in place is capable of guiding the different users through the array of different structures, the duplication, the different terminology. In other cases, the decision is taken that at least some parts of the data lake need to comply with some degree of standardization in the data base schemas, even in cases where such data bases are still doing a range of different jobs and so may need to be structured differently.

The diagram below shows such a typical collection of different data structures combined together within a data lake.

In the landing zone, the focus is on initially ingesting the data as it comes into the data lake in a raw format. The landing zone may also be extended to enable some initial processing of the data enabling it to be more generally useful across the data lake. The data scientist sandboxes are typically ad-hoc issue-specific, very flat structures containing many repeating groups of data. The Data Warehouse typically has a need for storing data in a reasonably flexible format for various different downstream uses, whereas the data marts are often made up of a lot of aggregated data focused on a specific business issue or a specific group of users.

So a key potential role of a data model is to enable a degree of standardization across such a disparate set of repositories. So that where possible, the same or similar structures and terminology are used to assist with the subsequent understanding and navigation of the data lake. The need to address potentially different characteristics across these data repositories in terms of how the data is stored and accessed, is where the different normalization patterns come in.

The typical data models traditionally used in the construction of data warehouse structures would often start with a model structure that has a reasonably high degree of normalization, typically a degree of normalization that provides the necessary level of flexibility to allow the effective representation of the various business needs. When these models are then transformed from such a logical platform-independent format into a more platform-specific format, varying degrees of denormalization takes place in order to ensure physical models that are performant in the specific physical environment

The focus on denormalization becomes critical in the context of the data lake and specifically in terms of any associated Hadoop/HDFS data structures.

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Charting the data lake: Model normalization patterns for data lakes

Leave a Reply Cancel reply

Upcoming Events

The Role of Taxonomy and Ontology in Semantic Layers

Evolving Your Data Architecture for Trustworthy Generative AI

World Wide Data Vault Consortium 2024

Shift Difficult Problems Left with Graph Analysis on Streaming Data

Categories

Tags

You Might Be Interested In

Why Enterprises Can’t Overlook IoT Device Management

What is an Artificial Neural Networks?

How Big Data Analytics Solving Product Promotion Issues

Recent Jobs

Judiciary Research Manager (Court Executive 2B)

Associate Director for Impact and Analytics

Data Scientist: Support NYS Attorney General Investigations

Judiciary Research Manager (Court Executive 2B)

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

Charting the data lake: Model normalization patterns for data lakes

Leave a Reply Cancel reply

Upcoming Events

Categories

Tags

You Might Be Interested In

Recent Jobs

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

To Drive Analytics Adoption
And manage change