Data Preparation: Is the Dream of Reversing the 80/20 Rule Dead?

Data Preparation: Is the Dream of Reversing the 80/20 Rule Dead?

Data Preparation: Is the Dream of Reversing the 80/20 Rule Dead?

I recently had someone ask me, “For years we’ve talked about changing analytics from 80% data prep and 20% analytics to 20% data prep and 80% analytics, yet we still seem stuck with 80% data prep. Why is that?” It is a very good question about a very real issue that causes many people frustration.

I believe that there is actually a good answer to it and that the perceived lack of progress is not as bad as it first appears. To explain, we need to differentiate between a new data source and/or a new business problem and existing ones we have addressed before.

Whenever a new data source is first acquired and analyzed, there is a lot of initial work required to understand, cleanse, and assess the data. Without that initial work, it isn’t possible to perform effective analysis. Much of the work will be a one-time effort, but it can be substantial. For example, determining how to identify and handle inaccurate sensor readings or incorrectly recorded prices.

Read Also:
Can Facebook's Machine-Learning Algorithms Accurately Predict Suicide?

From the earliest days of my career, some of the most challenging work has been working with new data. For the first couple of analytics on a new data source, the ratio of data prep and other grunt work to analytics is certainly much closer to 80% prep/20% analysis than to 20%/80%. However, as time passes and more analytics are completed with that new data source, things become much more streamlined and efficient.

Once a data source has been utilized for a range of analytics and is well understood, developing a new analytic process with it starts to drift towards the 20/80 ratio. By making use of things like Enterprise Analytic Datasets, it is possible to jump almost directly into a new analysis as long as that analysis can utilize the same type of metrics that past analysis made use of.

In fact, many large organizations have greatly standardized and streamlined the use of traditional data sources for analytics. For example, transactional data is utilized to analyze customer behavior in a wide range of industries. Many organizations have a large number of standardized customer metrics available that can feed analytics both new and old. I know of companies with tens of thousands of metrics for each customer based on transactional history.

Read Also:
6 'data' buzzwords you need to understand

 



Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
What’s the Difference Between Business Intelligence (BI) and EPM?

AI Paris

6
Jun
2017
AI Paris

20% off with code AIP17-7WDATA-20

Read Also:
Informatica announces industry’s first intelligent healthcare data lake

Chief Data Officer Summit San Francisco

7
Jun
2017
Chief Data Officer Summit San Francisco

$200 off with code DATA200

Read Also:
10 Tools for Data Visualizing and Analysis for Business

Customer Analytics Innovation Summit Chicago

7
Jun
2017
Customer Analytics Innovation Summit Chicago

$200 off with code DATA200

Read Also:
The irony of real-time analytics and why data needs refinement

HR & Workforce Analytics Innovation Summit 2017 London

12
Jun
2017
HR & Workforce Analytics Innovation Summit 2017 London

$200 off with code DATA200

Read Also:
The Importance of Data Democratization for the Digital Enterprise

Leave a Reply

Your email address will not be published. Required fields are marked *