Welcome back to our series on managing the data landscape and making sure you get the most value out of your data. In our first article in the series, "5 Critical Success Factors to Turn Data Into Insight," and the ones that follow it, we seek to define these five capabilities that play key roles in the success and repeatability of an actionable analytics program:
Our topic for this installment is data quality, which we can simply define as data “fit for purpose.” Obviously, data used for analytics has to be accurate. But there is a lot more to data quality. After all, if the data is not of sufficient quality – even if you know where to find it and how to use it – it still may not serve the purpose you need to deliver the insight you are seeking.
Data quality is more than just accuracy; the lens directed toward data quality is also focused on usage of that data. How data is used instills in data many more dimensions of data quality, including timeliness, relevance or accuracy.
Another often overlooked and important aspect of data quality is where it represents the intersection of business goals and alignment with data understanding. An organization’s business goals should drive the definition and prioritization of data quality dimensions and the respective requirements. Data understanding will tell you whether the data meets those business requirements. If it doesn’t comply, it should also indicate who to contact, such as the appropriate data steward, to determine the steps needed to improve the quality, based on your needs.
There is a recent line of thinking out of the big data movement that data quality is not as important in a high-volume environment. That is, the massive volumes of data will dampen out data quality issues. This can be true. It can also be entirely untrue. You need to give deliberate thought and engineering to the role and effects of data quality in whatever type of analytics solution you are planning.
How then do we incorporate these aspects of data quality into achieving better insight from your data?
Our previous article, "Mastering and Managing Data Understanding," talked about understanding the data landscape and understanding the nature of your data. The next logical step is what data elements do you need? Where do they come from and are they available and suitable for the intended purpose?
If you are using data for reporting, BI or analytics, you need to first understand the presentation and manipulation of the data. Normally, you are not grabbing discrete data elements. You are grabbing data and then processing it through an algorithm or formula. You are presenting the results as an analysis, scorecard or report. So the metric, KPI or algorithm determines what data you need. Since our prior article also covered defining a data landscape and inventory, the next step is sourcing the data. Now you need to apply the data quality dimensions to make sure the desired source of your data is appropriate for its usage.
Review the purpose of the metric or report — i.e., when it is produced, what is done with the result? This will give you clues to which data quality dimensions are relevant to your efforts. An operational metric will depend on timeliness. An elaborate algorithm will require adequate coverage and need to be of the correct historical value, without excessive decay.
Once you have an idea of use and source, double-check that you are very clear on the type of business use of the data.