In a 2013 report by IBM, the amount of data created everyday was estimated to be roughly 2,500,000TB. It very likely greatly exceeds this now, as wearables, AI, and connected devices have increasingly embedded themselves into society, gathering a veritable tidal wave of additional information for organisations to interrogate.
This data comes in three forms: unstructured, semi-structured, and structured. Since the dawn of IT, structured data has been the main resource of analysts. Even today, this is the case. In a 2015 IDG Enterprise study on big data and analytics, 83% of IT professionals said structured data initiatives were a high priority at their organizations, while just 43% said unstructured data initiatives were a top priority. Yet, it is estimated that 90% of all data is either semi-structured or unstructured. For organizations, this is a tremendous number of potential insights to be leaving off the table.
Structured data is anything that fits in a relational database that exists within a certain set of values or contained a specific set of characteristics. Semi-structured data has no data model but some kind of structure, i.e. emails, zipped files, HR records and XML data. Unstructured data, meanwhile, is everything that does not fit into relational databases. This includes videos, powerpoint presentations, company records, social media, RSS, documents, and text.
Both structured and unstructured data are necessary to use analytics to its potential, to build a full picture of a company’s health and to pinpoint areas for growth. Essentially, structured data analytics describes and explains what’s happening, while unstructured data analytics explains why it’s happening. Knowing what’s happening may enable you to form an idea of what’s going on and take action, but without understanding why you are running too high a risk that it’s wrong.
There are several reasons that companies have hitherto largely not analzyed their unstructured data in any meaningful way, central among which is simply the absence of necessary tools to do it. Advances in machine-learning have, however, meant that many now are, allowing organisations to analyze their mountains of unstructured content in ways they could not before.
Machine learning is valuable for the analysis of structured data, but indispensable when it comes to its unstructured counterpart because of the differences in scale. A human being simply cannot compute that amount of data.