Four things you should know about open data quality

Four things you should know about open data quality

Four things you should know about open data quality

Better data can lead to better outcomes. But what do we mean by ‘data quality’? ODI Associate Leigh Dodds, who is working with ODI Partner Experian to explore data quality in UK open datasets, explains

One way to help improve data quality is to generate quality metrics for a dataset. CC BY 2.0, uploaded by Elizabeth Hahn.

First impressions are everything. The efforts made to publish a dataset will guide a user’s experience in finding, accessing and using it. No matter how good the contents of your dataset, if it is not clearly documented, well-structured and easily accessible, then it won’t get used.

Open data certificates are a mark of quality and trust for open data. They measure the legal, technical, practical and social aspects of publishing data. Creating and publishing a certificate will help a publisher build confidence in their data. Open data certificates complement the five star scheme, that assesses how well data is integrated with the web.

Read Also:
6 Ideas to Help Government Realize Open Data

2. A dataset can contain a variety of problems

Data quality also relates to the contents of a dataset. Data errors usually occur when the data was originally collected. But the problems may only become apparent once a user begins working with the data.

There are a number of different types of data quality problem. The following list isn’t exhaustive but includes some of the most common:

The dataset isn’t valid when compared to its schema, for example there are missing columns, or they are in the wrong order The dataset contains invalid or incorrect values, for example numbers that are not within their expected range, text where there should be numbers, spelling mistakes or invalid phone numbers The dataset has missing data from some fields or the dataset doesn’t include all of the available data – some addresses in a dataset might be missing their postcode, for example The data may have precision problems — these may be due to limits in accuracy of the sensors or other devices (such as GPS devices) that were used to record the data, or they many be due to simple rounding errors introduced during analysis

Read Also:
Shaping mental health support through data

3.

 



Data Innovation Summit 2017

30
Mar
2017
Data Innovation Summit 2017

30% off with code 7wData

Read Also:
How insurance companies can take advantage of digital transformation

Big Data Innovation Summit London

30
Mar
2017
Big Data Innovation Summit London

$200 off with code DATA200

Read Also:
How Does A CMO Build A Modern Marketing Organization?

Enterprise Data World 2017

2
Apr
2017
Enterprise Data World 2017

$200 off with code 7WDATA

Read Also:
Barcelona – Spain's Premier Smart City

Data Visualisation Summit San Francisco

19
Apr
2017
Data Visualisation Summit San Francisco

$200 off with code DATA200

Read Also:
The Next Generation of Public Employees Must Understand Data and Policy

Chief Analytics Officer Europe

25
Apr
2017
Chief Analytics Officer Europe

15% off with code 7WDCAO17

Read Also:
The Next Generation of Public Employees Must Understand Data and Policy

Leave a Reply

Your email address will not be published. Required fields are marked *