Building a Smart Data Lake While Avoiding the ‘Dump’

Building a Smart Data Lake While Avoiding the ‘Dump’

A data lake needs to be fed and governed properly before analytics can discover kernels of insight.

Data lakes are all the rage right now, and will continue to grow in 2017, but they’re much more than a dumping ground for unmodeled and unverified data of all types. Companies need to approach them strategically, and with some solid understanding of current best practices, in order to keep management at a minimum and give various analytics tools the best shot at extracting meaningful data.

In a recent webinar from TDWI and Pentaho, Philip Russom, the senior research director of data management at TDWI, said, “You can’t just plan your lake as a data repository. You also need to plan the toolage around it.”

Data lakes are the function of companies collecting more data than ever before, and then demanding that technical teams make new insights from that data. Data is persisted in its raw state so that it can handle large volumes of diverse data, quick ingestion, and leave many opportunities for analysts to attack it with new technology.

Most data lakes are built using Hadoop, an open-source framework. Hadoop isn’t necessarily required, but it is where most companies are headed. Russom praises Hadoop’s benefits, such as the ability to manage multi-structured and unstructured data, and a relatively small cost compared to relational databases like MySQL. Russom says, “Hadoop is not just storage. Equally important is that it’s a powerful processing platform for a wide range of analytics, both set-based and algorithmic.”

Companies are using data lakes for analytics, reporting, marketing, sales, and more. Best of all, a data lake helps companies get business value from both old and new data.

Without some smart management for the data going into the lake—if you simply launch a Hadoop-powered data lake and throws everything into it—you’re going to end up with a “toxic dump,” according to Chuck Yarbrough, the senior director of solutions marketing and management at Pentaho, who also presented during the webinar.

The challenge is that incoming data varies in volume, diversity, type, whether there’s metadata or not—it’s a lot to think about, but the ability to ingest data is essential if you want a variety of users to actually take advantage of it.

 

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

Are we replacing science with an AI oracle?

6 Jun, 2022

In ancient Greece, people who had trouble with the answers offered by Aristotle, Pythagoras, and Archimedes could turn to the …

Read more

The triple A solution: How analytics, automation, and AI will redefine customer service

31 Jul, 2017

A quick trip down memory lane to the 1990s will help us see just how far customer services has changed. …

Read more

The AI Marketing Canvas: A Roadmap To Implementing Artificial Intelligence In Marketing

26 Aug, 2021

Artificial intelligence (AI) is one of the hottest topics in marketing right now. Raj Venkatesan and Jim Lecinski recently published …

Read more

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.