The Smart Way to Deal With Messy Data

The Smart Way to Deal With Messy Data

The Smart Way to Deal With Messy Data

Unstructured datadata that is not organized in a predefined way, such as text — is now widely available. But structure must be added to the data to make it useable for analysis, which means significant processing. That processing can be a problem.

In a form of modern alchemy, modern analytics processes now transmute “base” unstructured data into “noble” business value. Systems everywhere greedily salt away every imaginable kind of data. Technologies such as Hadoop and NoSQL store this hoard easily in its native unstructured form. Natural language processing, feature extraction (distilling nonredundant measures from larger data), and speech recognition now routinely alchemize vast quantities of unstructured text, images, audio, and video, preparing it for analysis. These processes are nothing short of amazing, working against entropy to create order from disorder.

Unfortunately, while these processing steps are impressive, they are far from free or free from error. I can’t help but think that a better alternative in many cases would be to avoid the need for processing altogether.

Read Also:
Corporate strategic planning in an agile organisation

We all know how each step in a process mangles information. In the telephone game, as each person whispers to the next player what they think was said to them, words can morph into an unexpected or misleading final message. In a supply chain, layers exacerbate distortion as small mistakes and uncertainty quickly compound.

By analogy, organizations are playing a giant game of telephone with data, and unstructured data makes the game far more difficult. In a context where data janitorial activities consume 50% to 80% of scarce data scientist resources, each round of data telephone costs organizations in accuracy, effort, and time — and few organizations have a surplus of any of these three.

Within organizations, each processing step can be expensive to develop and maintain. But the growth in importance of data sharing between organizations magnifies these concerns. Our recently published report, “Analytics Drives Success with IoT,” associates business value with sharing data between organizations in the context of the internet of things. And, to foreshadow our report to be released in January, we observe similar results in the broader analytics context. But with every transfer of data, more processes need to be developed and maintained.

Read Also:
How Big Data is Changing and Influencing Internet Marketing

If this processing were unavoidable, then it would just be a cost of data sharing within or between organizations.

 



HR & Workforce Analytics Summit 2017 San Francisco

19
Jun
2017
HR & Workforce Analytics Summit 2017 San Francisco

$200 off with code DATA200

Read Also:
Machine Learning Audits in the 'Big Data Age'

M.I.E. SUMMIT BERLIN 2017

20
Jun
2017
M.I.E. SUMMIT BERLIN 2017

15% off with code 7databe

Read Also:
5 Ways To Get Buy-in For Your Data Initiative

Sentiment Analysis Symposium

27
Jun
2017
Sentiment Analysis Symposium

15% off with code 7WDATA

Read Also:
Predictive Analytics & AI — Separating Hype from Reality

Data Analytics and Behavioural Science Applied to Retail and Consumer Markets

28
Jun
2017
Data Analytics and Behavioural Science Applied to Retail and Consumer Markets

15% off with code 7WDATA

Read Also:
Popular Internet of Things Forecast of 50 Billion Devices by 2020 Is Outdated

AI, Machine Learning and Sentiment Analysis Applied to Finance

28
Jun
2017
AI, Machine Learning and Sentiment Analysis Applied to Finance

15% off with code 7WDATA

Read Also:
Analytics Teams Should Be Like Super Bowl Champions. Really.
Read Also:
Democratize big data by using distributed data lakes

Leave a Reply

Your email address will not be published. Required fields are marked *