Wrangling Data in a Big Data World

Wrangling Data in a Big Data World

Wrangling Data in a Big Data World

It’s Monday morning and Luke, the BIGCO brand manager, walks into his office. He asks the digital assistant device on his desk how BIGCO’s Acme soft-drinks brand performed over the weekend. After a second, the assistant replies that Acme’s share of sales has fallen by 0.5%. Luke asks what’s driving the decrease, and is told it’s due to issues in the BIGCO West region. His digital assistant offers to email Tom, the sales manager, with a summary of their findings.

Two hours later in California, Tom uses the report to drill down to the root cause, combining sales data with BIGCO’s shipment data. After he blends in third-party data, including market share, weather and econometrics, it looks as if failure to promote Acme during prolonged periods of good weather is a contributing factor. Tom is able to simulate the effect of various promotions on brand share and profitability, and creates a plan. Luke approves the plan, and his digital assistant makes a note to monitor the situation and report progress.

Read Also:
Keeping transformations on target

Digital assistants and smart machines are cool, but the most useful business insights come from combining internal data with a multiplicity of external data sources, whether it be sales, shipments, promotions, financials, or a hundred other things. It’s data integration that allows this scenario to play out. Without data integration, the only thing cool technology can do is stare helplessly at a pile of bricks it can’t assemble into anything useful.

Data integration relies on the ability to link fields containing the same information, for example information about states, in different datasets. If all datasets used the same identifiers, this is would be easy, but they don’t: some use two-digit identifiers (IL, CT), some use full names (Illinois, Connecticut), and so on. And that’s a simple example: Universal Product Codes (UPC) identify a type of product (10-oz. can of BIGCO’s Acme soda, say) and can be used for point of sale and stock keeping. Electronic Product Codes can be used to identify individual items—so every single can (more likely, every bottle of champagne, as people aren't terribly interested in tracking by the soda can) could have a different code. Other codes are used to identify aggregations of products, such as in-store combo packs and warehouse pallets.

Read Also:
5 Ways to Avoid Common Pitfalls in Large-Scale Analytics Projects

Data integration means reconciling these different entities and coding systems. Part of the process is to make the data ready for analysis by either aggregating it up or disaggregating it down to a common basis so that, for example, point-of-sale data at UPC level can be combined with advertising data at brand level. Finally, data has to be enriched to enhance its usefulness for analytics: supplementing a brief product description, say, with codified attributes such as manufacturer, brand, size, flavor, packaging, health claims and ingredients.

The gold standard for data integration is the extract, transform and load (ETL) process associated with the data warehouse. ETL provides an automated, high-quality process with defined outcomes, and is the best way to curate long-lived, high-value assets, such as the data used in C-suite dashboards and KPIs.



Data Innovation Summit 2017

30
Mar
2017
Data Innovation Summit 2017

30% off with code 7wData

Read Also:
How data can inspire creativity
Read Also:
Eliminating data security weaknesses: What is proactive asset protection?

Big Data Innovation Summit London

30
Mar
2017
Big Data Innovation Summit London

$200 off with code DATA200

Read Also:
Introduction to streaming data platforms

Enterprise Data World 2017

2
Apr
2017
Enterprise Data World 2017

$200 off with code 7WDATA

Read Also:
IBM's Watson does healthcare: Data as the foundation for cognitive systems for population health

Data Visualisation Summit San Francisco

19
Apr
2017
Data Visualisation Summit San Francisco

$200 off with code DATA200

Read Also:
How to Avoid a Loveless Business Intelligence Relationship

Chief Analytics Officer Europe

25
Apr
2017
Chief Analytics Officer Europe

15% off with code 7WDCAO17

Read Also:
It’s data integration, but not as we know it

Leave a Reply

Your email address will not be published. Required fields are marked *