3 keys to keep your data lake from becoming a data swamp Blog

3 keys to keep your data lake from becoming a data swamp

by 7wData
June 16, 2017

For years, buoyed by technologies like Apache Hadoop, organizations have been seeking to build data lakes — enterprise-wide data management platforms that allow them to store all of their data in their native format. Data lakes promise to break down information silos by providing a single data repository the entire organization can use for everything from business analytics to data mining. Raw and ungoverned, data lakes have been pitched as a big data catch-all and cure-all.

But Avi Perez, CTO of business intelligence (BI) software specialist Pyramid Analytics says he sees many customers and prospects whose data lakes are deteriorating into data swamps — massive repositories of data that are completely inaccessible to end users.

"Databases are really expensive," Perez says. "The data lake fundamentally answers that problem. Data lakes, and all big data initiatives, come from, one, pressure in the marketplace to have one, and secondly, real-world data generators spitting up gobs of data that you need to find a way to store."

But while a number of the world's most successful companies have built businesses around their data lakes (Google is a prime example), many others are collecting data without any clear way to get value from it.

"They just collect dust," Perez says. "You're just collecting junk. I think they'll get abandoned. Eventually you cut the budget for stuff that's big and expensive and not doing anything."

That's not to say the idea behind data lakes is a bad one. Perez is convinced that all companies will need one eventually. But creating a data lake that your end users can actually benefit from requires deliberation.

To avoid drowning in your own data lake, Perez recommends adopting three principles.

Perez says one of the biggest mistakes organizations make is collecting too much data, simply because they can. Consider your smartphone. If you own one, chances are you've got hundreds or more pictures stored on it.

"You end up with a billion pictures on your phone, and yet 99 percent of them are probably garbage that you would get rid of in a heartbeat," he says. "It's gotten so easy to take pictures with your phone, it's essentially free. And you probably think, 'One day I'll go and clean it up,' but of course no one ever does. You're collecting an enormous amount of information, but you have no way to work your way through it to use it effectively."

When you inevitably want to show someone a particular photograph, finding it can require scrolling through an enormous volume of junk.

The same thing happens with data lakes, Perez says. Storing data in Hadoop is inexpensive enough that it's often considered free.

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

3 keys to keep your data lake from becoming a data swamp

Leave a Reply Cancel reply

Upcoming Events

The Role of Taxonomy and Ontology in Semantic Layers

Evolving Your Data Architecture for Trustworthy Generative AI

World Wide Data Vault Consortium 2024

Shift Difficult Problems Left with Graph Analysis on Streaming Data

Categories

Tags

You Might Be Interested In

Big data and machine learning – is the glass half empty?

Could online tutors and artificial intelligence be the future of teaching?

Why the cloud could hold the cure to diseases

Recent Jobs

Associate Director for Impact and Analytics

Data Scientist: Support NYS Attorney General Investigations

Judiciary Research Manager (Court Executive 2B)

Cyber Security Engineer – P2

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

3 keys to keep your data lake from becoming a data swamp

Leave a Reply Cancel reply

Upcoming Events

Categories

Tags

You Might Be Interested In

Recent Jobs

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

To Drive Analytics Adoption
And manage change