The What and Where of Big Data: A Data Definition Framework

The What and Where of Big Data: A Data Definition Framework

The What and Where of Big Data: A Data Definition Framework

I recently read a good article on the difference between structured and unstructured data. The author defines structured data as data that can be easily organized. As a result these type of data are easily analyzable. Unstructured data refers to information that either does not have a pre-defined data model and/or is not organized in a predefined manner. Unstructured data are not easy to analyze. A primary goal of a data scientist is to extract structure from unstructured data. Natural language processing is a process of extracting something useful (e.g., sentiment, topics) from something that is essentially useless (e.g., text).

While I like these definitions she offers, she included an infographic that is confusing. It equates the structural nature of the data with the source of the data, suggesting that structured data are generated solely from internal/enterprise systems while unstructured data are generated solely from social media sources. I think it would be useful to separate the format (structure vs. unstructured) of the data from source (internal vs. external) of data.

Read Also:
Datameer and Cloudera enhance analytics for Aussie telcos

Generally speaking, business data can come from either internal sources or from external sources. Internal sources of data reflect those data that are under the control of the business. These data are housed in financial reporting system, operational systems, HR systems and CRM systems, to name a few. Business leaders have a large say in the quality of internal data; they are essentially a byproduct of the processes and systems the leaders use to run the business and generate/store the data.

External sources of data, on the other hand, are any data generated outside the walls of the business. These data sources include social media, online communities, open data sources and more. Due to the nature of source of data, external sources of data are under less control by the business than are internal sources of data. These data are collected by other companies, each using their unique systems and processes.

 



Chief Analytics Officer Europe

25
Apr
2017
Chief Analytics Officer Europe

15% off with code 7WDCAO17

Read Also:
CEOs have a problem with the truth: How data scientists can help
Read Also:
Artificial Intelligence: 2017 Predictions from Forrester

Chief Analytics Officer Spring 2017

2
May
2017
Chief Analytics Officer Spring 2017

15% off with code MP15

Read Also:
List of 6 Analytics Maturity Models

Big Data and Analytics for Healthcare Philadelphia

17
May
2017
Big Data and Analytics for Healthcare Philadelphia

$200 off with code DATA200

Read Also:
A ‘Live Business’ Is The True Sign Of Digital Transformation

SMX London

23
May
2017
SMX London

10% off with code 7WDATASMX

Read Also:
Why Data Sharing Will Help Advance Genomic Treatment

Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
Study uses text-mining to improve market intelligence on startups

Leave a Reply

Your email address will not be published. Required fields are marked *