Improving the Quality of Data on Hadoop -

Improving the Quality of Data on Hadoop –

Improving the Quality of Data on Hadoop –

As the value and volume of data explodes, so does the need for mature data management. Big data is now receiving the same treatment as relational data -- integration, transformation, process orchestration, and error recovery -- so the quality of big data is becoming critical.

Because of the promise and capacity of Hadoop, data quality was initially overlooked. However, not all Hadoop use cases are for analytics; some are driving critical business processes. Data quality is now a key consideration for process improvement and decision making based on data coming out of Hadoop.

With the size of our data stores in Hadoop, we must consider whether data quality practices can scale to the potential immensity of big data. Hadoop obviously shatters the limits of data storage, not only in terms of data volume and variety as well as in terms of structure. One way that data quality is maintained in a conventional data warehouse is by imposing strict limits on the volume, variety, and structure of data. This is in direct opposition to the advantages that Hadoop and NoSQL offer.

Read Also:
Delivering Business Intelligence and Data Analytics on Converged Systems

We must also consider the cost of poor data quality within a Hadoop cluster. From an analytics perspective, "bad data" may not be as troublesome as it once was, if we consider the statistical insignificance of incorrect, incomplete, or inaccurate records. The effect of a statistical outlier or anomaly is reduced by the massive amounts of data around it; the sheer volume effectively drowns it out.

In conventional data analysis and data warehousing practice, "bad data" was something to be detected, cleansed, reconciled, and purged.

 



Sentiment Analysis Symposium

27
Jun
2017
Sentiment Analysis Symposium

15% off with code 7WDATA

Read Also:
Growing Agile… Not Scaling!

Data Analytics and Behavioural Science Applied to Retail and Consumer Markets

28
Jun
2017
Data Analytics and Behavioural Science Applied to Retail and Consumer Markets

15% off with code 7WDATA

Read Also:
How to Simplify Your BI in the Age of Data Complexity

AI, Machine Learning and Sentiment Analysis Applied to Finance

28
Jun
2017
AI, Machine Learning and Sentiment Analysis Applied to Finance

15% off with code 7WDATA

Read Also:
Why Should You Care About Machine Learning?
Read Also:
Tips to Make Sure Your BI Projects Will Succeed In 2017!

Real Business Intelligence

11
Jul
2017
Real Business Intelligence

25% off with code RBIYM01

Read Also:
Delivering Business Intelligence and Data Analytics on Converged Systems

Advanced Analytics Forum

20
Sep
2017
Advanced Analytics Forum

15% off with code Discount15

Read Also:
Building a Common Data Platform for the Enterprise on Apache Hadoop

Leave a Reply

Your email address will not be published. Required fields are marked *