4 Best Practices for Data Lakes

4 Best Practices for Data Lakes

Data lakes are a still-evolving way for companies to better leverage Big Data. Understanding data lake use cases is a good starting point.

Data lakes sound simple: Pool data or information into a Big Data system that combines processing speed with storage -- a Hadoop cluster or an in-memory solution -- so the business can access it for new insight. As with so much in technology, though, the reality is much more challenging than the dream.

Part of that is a misunderstanding of what a data lake should be, said the man who coined the term, Pentaho founder and CTO James Dixon. He never intended data lakes to describe a huge Hadoop repository that pulled data from all enterprise applications.

"When people ask what a data lake is, I tell them it's what you used to have on tape. Take what you have on tape and pour it into a data lake and start exploring that data," Dixon said. "Our story was always only put into Hadoop what you need to; if you want to combine information from the data lake with information in your CRM system, well just do a join, do that blending of data only when you need to."

Despite Dixon's intentions, the term took on a broader meaning with bigger promises. Folks began viewing Big Data lakes as a way to solve integration headaches by bringing all data into one super-fast, easy-to-access repository.

Instead, the repositories turned into slow and unyielding data swamps. Big Data required special expertise to analyze. The conclusions that resulted from using raw data raised red flags about data quality and governance.

"Everybody wanted to look at a data lake as the silver bullet for IT. Has there ever been one? I'm still waiting," said Nick Heudecker, who researches data management for Gartner's IT Leaders (ITL) Data and Analytics group. "I think once you get beyond that discovery phase, you need to do more. Data lakes, that same infrastructure can help, but you need to go into more of a professional information management world once you used that data to answer the questions that you generated."

So given the reality of data lakes, how can you utilize them to your organization's advantage? Experts say there are four key data lake best practices:

To build a successful data lake, enterprises need to throw out the idea that data lakes will allow you to collect all your data in one place. It's also important to understand that data lakes are not a replacement for enterprise data management systems and practices -- at least, not given the current state of Big Data technology.

"Organizations are still talking about data lakes but they're also recognizing that all lakes are not equal," said Jack Norris, senior VP of Data and Applications with MapR. "There's a certain amount of capabilities you need or we've heard people talk about data swamps, where it's hard to get data to flow out or in, it's just stagnating there."

Given that the data lake didn't work out as planned, is it still viable? Yes, provided you understand its limits, experts say.

"I have a pretty scoped view - I don't want to say narrow - but a very scoped view of what a data lake is," Heudecker said. "To me, it's a data science sandbox. It's where you play with data and you try to find new insights. Once you've found that new insight, does it make sense to leave data in its raw format? I would argue that it doesn't because you now need to optimize the data. You need to insure that it's governed, that it's semantically consistent, that it will meet the needs of the business consumers so to me the data lake is a lab. And you can do other things with it but for me, when I'm advising clients that's how I try to advise them to think about their data lake."

That isn't as limiting as it may sound. For instance, Heudecker notes enterprises use data lakes to extract insight from Internet of Things deployments.

 

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

Why our over-reliance on big data shows that we don’t trust ourselves

24 Oct, 2016

Is data the modern oracle, the oil that will power the next industrial revolution—or just another round of business hype? …

Read more

Kontainers brings digitization and machine learning to freight forwarding

7 Oct, 2016

Kontainers is disrupting the freight industry with digitization and predictive analytics. We talked to Charles Lee, CTO and CPO at …

Read more

10 steps for creating a single view of your business

10 Apr, 2017

The modern enterprise is data-driven. The capability to quickly access and act upon information has become a key competitive advantage. …

Read more

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.