4 Best Practices for Data Lakes

by 7wData
July 19, 2016

Data lakes are a still-evolving way for companies to better leverage Big Data. Understanding data lake use cases is a good starting point.

Data lakes sound simple: Pool data or information into a Big Data system that combines processing speed with storage -- a Hadoop cluster or an in-memory solution -- so the business can access it for new insight. As with so much in technology, though, the reality is much more challenging than the dream.

Part of that is a misunderstanding of what a data lake should be, said the man who coined the term, Pentaho founder and CTO James Dixon. He never intended data lakes to describe a huge Hadoop repository that pulled data from all enterprise applications.

"When people ask what a data lake is, I tell them it's what you used to have on tape. Take what you have on tape and pour it into a data lake and start exploring that data," Dixon said. "Our story was always only put into Hadoop what you need to; if you want to combine information from the data lake with information in your CRM system, well just do a join, do that blending of data only when you need to."

Despite Dixon's intentions, the term took on a broader meaning with bigger promises. Folks began viewing Big Data lakes as a way to solve integration headaches by bringing all data into one super-fast, easy-to-access repository.

Instead, the repositories turned into slow and unyielding data swamps. Big Data required special expertise to analyze. The conclusions that resulted from using raw data raised red flags about data quality and governance.

"Everybody wanted to look at a data lake as the silver bullet for IT. Has there ever been one? I'm still waiting," said Nick Heudecker, who researches data management for Gartner's IT Leaders (ITL) Data and Analytics group. "I think once you get beyond that discovery phase, you need to do more. Data lakes, that same infrastructure can help, but you need to go into more of a professional information management world once you used that data to answer the questions that you generated."

So given the reality of data lakes, how can you utilize them to your organization's advantage? Experts say there are four key data lake best practices:

To build a successful data lake, enterprises need to throw out the idea that data lakes will allow you to collect all your data in one place. It's also important to understand that data lakes are not a replacement for enterprise data management systems and practices -- at least, not given the current state of Big Data technology.

"Organizations are still talking about data lakes but they're also recognizing that all lakes are not equal," said Jack Norris, senior VP of Data and Applications with MapR. "There's a certain amount of capabilities you need or we've heard people talk about data swamps, where it's hard to get data to flow out or in, it's just stagnating there."

Given that the data lake didn't work out as planned, is it still viable? Yes, provided you understand its limits, experts say.

"I have a pretty scoped view - I don't want to say narrow - but a very scoped view of what a data lake is," Heudecker said. "To me, it's a data science sandbox. It's where you play with data and you try to find new insights. Once you've found that new insight, does it make sense to leave data in its raw format? I would argue that it doesn't because you now need to optimize the data. You need to insure that it's governed, that it's semantically consistent, that it will meet the needs of the business consumers so to me the data lake is a lab. And you can do other things with it but for me, when I'm advising clients that's how I try to advise them to think about their data lake."

That isn't as limiting as it may sound. For instance, Heudecker notes enterprises use data lakes to extract insight from Internet of Things deployments.

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

4 Best Practices for Data Lakes

Leave a Reply Cancel reply

Upcoming Events

The Role of Taxonomy and Ontology in Semantic Layers

Evolving Your Data Architecture for Trustworthy Generative AI

World Wide Data Vault Consortium 2024

Shift Difficult Problems Left with Graph Analysis on Streaming Data

Categories

Tags

You Might Be Interested In

Why our over-reliance on big data shows that we don’t trust ourselves

Kontainers brings digitization and machine learning to freight forwarding

10 steps for creating a single view of your business

Recent Jobs

Associate Director for Impact and Analytics

Data Scientist: Support NYS Attorney General Investigations

Judiciary Research Manager (Court Executive 2B)

Cyber Security Engineer – P2

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

4 Best Practices for Data Lakes

Leave a Reply Cancel reply

Upcoming Events

Categories

Tags

You Might Be Interested In

Recent Jobs

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

To Drive Analytics Adoption
And manage change