Why Are We Treating Data Like a Picasso?

Why Are We Treating Data Like a Picasso?

Why Are We Treating Data Like a Picasso?

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks. 

The models of Provenance, Lineage, and Chain of Custody are used in fine art to determine when a piece was created, the sequence of locations where it was held, how it was touched along the way, and who has owned it since creation, all with the purpose of authenticating the piece. What does this have to do with boring data?

It turns out many decisions which affect our daily lives are made using a single final result – or score – which is derived from many other pieces of data. What if one of those pieces of data was wrong or stale? This could lead to “Bad Data”, and the consequences can range from the inconvenient to the catastrophic. We must understand the data components used to calculate a final number to ensure the result is valid and current; this is why we need to adopt the models of Data provenance, Data Lineage and Data Chain of Custody, and make them an intrinsic part of any data driven decision.

Read Also:
Sisense wants to make every user a data scientist

Let me start with a few Examples:

The cost of “Bad Data” ranges from TDWI (The Data Warehousing Institute) estimate of $611 billion each year for U.S. firms, to IBM’s $3.1 trillion per year figure, either figure is simply staggering, not to mention the individual lives affected by this.

The causes of Bad Data typically fall into these categories:

The right solution needs to address all these issues under the umbrella of Data Governance, and it must provide a full audit trail to record and verify all events that could change every piece of data going into a meaningful calculation. It must enable enterprises to have the proper tracking and monitoring of data via Data Provenance, Data Lineage, and Data Chain of Custody.

Data Provenance refers to the “origin” and “source” of data – where a piece of data came from and the process by which came to be in its present state.

Read Also:
How neuromorphic ‘brain chips’ will begin the next era in computing

Data Lineage is the process of tracing and recording the origins of data and its movement between databases or systems; it tracks the data life cycle from its origin to its destination over time, and what happens as it goes through diverse processes.

 



Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
Unlocking the value in patient-generated health data

AI Paris

6
Jun
2017
AI Paris

20% off with code AIP17-7WDATA-20

Read Also:
Birdwatcher: Data analysis and OSINT framework for Twitter

Chief Data Officer Summit San Francisco

7
Jun
2017
Chief Data Officer Summit San Francisco

$200 off with code DATA200

Read Also:
How Restaurants are Cooking Up Insights with Analytics

Customer Analytics Innovation Summit Chicago

7
Jun
2017
Customer Analytics Innovation Summit Chicago

$200 off with code DATA200

Read Also:
How predictive analytics discovers a data breach before it happens

HR & Workforce Analytics Innovation Summit 2017 London

12
Jun
2017
HR & Workforce Analytics Innovation Summit 2017 London

$200 off with code DATA200

Read Also:
Data surfaces small insights that yield winning results
Read Also:
Data is Beautiful: 7 Data Visualization Tools for Digital Marketers

Leave a Reply

Your email address will not be published. Required fields are marked *