UTL-1010-L

What is a “Data Lake” Anyway?

What is a “Data Lake” Anyway?

One of the consequences of the hype and exaggeration that surrounds Big Data is that the labels and definitions that we use to describe the field quickly become overloaded. One of the Big Data concepts that presently we risk over-loading to the point of complete abstraction is the “Data Lake”.

Data Lake discussions are everywhere right now; to read some of these commentaries, the Data Lake is almost the prototypical use-case for the Hadoop technology stack. But there are far fewer actual, reference-able Data Lake implementations than there are Hadoop deployments – and even less documented best-practice that will tell you how you might actually go about building one.

So if the Data Lake is more architectural concept than physical reality in most organisations today, now seems like a good time to ask: What is a Data Lake anyway? What do we want it to be? And what do we want it not to be?

Read Also:
New To Big Data? You Don’t Have To Face The Giant Alone

When you cut through the hype, most proponents of the Data Lake concept are promoting three big ideas:

1) It should capture all data in a centralized, Hadoop-based repository (whatever all means)

2) It stores the data in a raw, un-modelled format

3) And that doing so will enable you to break down the barriers that still inhibit end-to-end, cross-functional Analytics in too many organisations

Now those are lofty and worthwhile ambitions, but at this point many of you could be forgiven a certain sense of déjà vu – because improving data accessibility and integration are what many of you thought you were building the Data Warehouse for.

In fact, many production Hadoop applications are built according to an application-specific design pattern, rather than an application-neutral one that allows multiple applications to be brought to a single copy of data (in technical jargon, this is called a “star schema” design pattern). And whilst there is a legitimate place in most organizations for at least some application-specific data stores, far from breaking down barriers to Enterprise-wide Analytics, many of these solutions risk creating a new generation of data silos.

Read Also:
Big data in financial markets is now getting the 'fintech' treatment

A few short years after starting their Hadoop journey, a leading Teradata customer has already deployed more than twenty sizeable application-specific Hadoop clusters.

 



Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
Let’s Get Real: Acting on Data in Real Time

AI Paris

6
Jun
2017
AI Paris

20% off with code AIP17-7WDATA-20

Read Also:
Key Technologies Behind Big Data

Chief Data Officer Summit San Francisco

7
Jun
2017
Chief Data Officer Summit San Francisco

$200 off with code DATA200

Read Also:
Strategic Placement for Big Data in Organizations

Customer Analytics Innovation Summit Chicago

7
Jun
2017
Customer Analytics Innovation Summit Chicago

$200 off with code DATA200

Read Also:
New To Big Data? You Don’t Have To Face The Giant Alone

HR & Workforce Analytics Innovation Summit 2017 London

12
Jun
2017
HR & Workforce Analytics Innovation Summit 2017 London

$200 off with code DATA200

Read Also:
Increased Awareness of the Value of Data Science

Leave a Reply

Your email address will not be published. Required fields are marked *