Why Are We Treating Data Like a Picasso?

Why Are We Treating Data Like a Picasso?

Why Are We Treating Data Like a Picasso?

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks. 

The models of Provenance, Lineage, and Chain of Custody are used in fine art to determine when a piece was created, the sequence of locations where it was held, how it was touched along the way, and who has owned it since creation, all with the purpose of authenticating the piece. What does this have to do with boring data?

It turns out many decisions which affect our daily lives are made using a single final result – or score – which is derived from many other pieces of data. What if one of those pieces of data was wrong or stale? This could lead to “Bad Data”, and the consequences can range from the inconvenient to the catastrophic. We must understand the data components used to calculate a final number to ensure the result is valid and current; this is why we need to adopt the models of Data provenance, Data Lineage and Data Chain of Custody, and make them an intrinsic part of any data driven decision.

Read Also:
The Interview: Nigel Turner On How to Succeed In Data Governance

Let me start with a few Examples:

The cost of “Bad Data” ranges from TDWI (The Data Warehousing Institute) estimate of $611 billion each year for U.S. firms, to IBM’s $3.1 trillion per year figure, either figure is simply staggering, not to mention the individual lives affected by this.

The causes of Bad Data typically fall into these categories:

The right solution needs to address all these issues under the umbrella of Data Governance, and it must provide a full audit trail to record and verify all events that could change every piece of data going into a meaningful calculation. It must enable enterprises to have the proper tracking and monitoring of data via Data Provenance, Data Lineage, and Data Chain of Custody.

Data Provenance refers to the “origin” and “source” of data – where a piece of data came from and the process by which came to be in its present state.

Data Lineage is the process of tracing and recording the origins of data and its movement between databases or systems; it tracks the data life cycle from its origin to its destination over time, and what happens as it goes through diverse processes.

Read Also:
Bad Data Costing Companies A Fortune

 



Chief Analytics Officer Spring 2017

2
May
2017
Chief Analytics Officer Spring 2017

15% off with code MP15

Read Also:
Artificial Intelligence, Deep Learning, and Neural Networks, Explained

Big Data and Analytics for Healthcare Philadelphia

17
May
2017
Big Data and Analytics for Healthcare Philadelphia

$200 off with code DATA200

Read Also:
Bridging the Gap Between Big Data Science, Health IT Usability

SMX London

23
May
2017
SMX London

10% off with code 7WDATASMX

Read Also:
Chief data officer role shakes up traditional data governance

Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
Bringing DevOps to Analytics and Data Science

AI Paris

6
Jun
2017
AI Paris

20% off with code AIP17-7WDATA-20

Read Also:
Top Questions to Ask Yourself When Shopping for a Data Integration Solution

Leave a Reply

Your email address will not be published. Required fields are marked *