Wrangling Data in a Big Data World

by 7wData
February 7, 2017

It’s Monday morning and Luke, the BIGCO brand manager, walks into his office. He asks the digital assistant device on his desk how BIGCO’s Acme soft-drinks brand performed over the weekend. After a second, the assistant replies that Acme’s share of sales has fallen by 0.5%. Luke asks what’s driving the decrease, and is told it’s due to issues in the BIGCO West region. His digital assistant offers to email Tom, the sales manager, with a summary of their findings.

Two hours later in California, Tom uses the report to drill down to the root cause, combining sales data with BIGCO’s shipment data. After he blends in third-party data, including market share, weather and econometrics, it looks as if failure to promote Acme during prolonged periods of good weather is a contributing factor. Tom is able to simulate the effect of various promotions on brand share and profitability, and creates a plan. Luke approves the plan, and his digital assistant makes a note to monitor the situation and report progress.

Digital assistants and smart machines are cool, but the most useful business insights come from combining internal data with a multiplicity of external data sources, whether it be sales, shipments, promotions, financials, or a hundred other things. It’s data integration that allows this scenario to play out. Without data integration, the only thing cool technology can do is stare helplessly at a pile of bricks it can’t assemble into anything useful.

Data integration relies on the ability to link fields containing the same information, for example information about states, in different datasets. If all datasets used the same identifiers, this is would be easy, but they don’t: some use two-digit identifiers (IL, CT), some use full names (Illinois, Connecticut), and so on. And that’s a simple example: Universal Product Codes (UPC) identify a type of product (10-oz. can of BIGCO’s Acme soda, say) and can be used for point of sale and stock keeping. Electronic Product Codes can be used to identify individual items—so every single can (more likely, every bottle of champagne, as people aren't terribly interested in tracking by the soda can) could have a different code. Other codes are used to identify aggregations of products, such as in-store combo packs and warehouse pallets.

Data integration means reconciling these different entities and coding systems. Part of the process is to make the data ready for analysis by either aggregating it up or disaggregating it down to a common basis so that, for example, point-of-sale data at UPC level can be combined with advertising data at brand level. Finally, data has to be enriched to enhance its usefulness for analytics: supplementing a brief product description, say, with codified attributes such as manufacturer, brand, size, flavor, packaging, health claims and ingredients.

The gold standard for data integration is the extract, transform and load (ETL) process associated with the data warehouse. ETL provides an automated, high-quality process with defined outcomes, and is the best way to curate long-lived, high-value assets, such as the data used in C-suite dashboards and KPIs.

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Wrangling Data in a Big Data World

Leave a Reply Cancel reply

Upcoming Events

The Role of Taxonomy and Ontology in Semantic Layers

Evolving Your Data Architecture for Trustworthy Generative AI

World Wide Data Vault Consortium 2024

Shift Difficult Problems Left with Graph Analysis on Streaming Data

Categories

Tags

You Might Be Interested In

Ask the Data Governance Coach: What is a Data Glossary?

Data virtualization layer feeds logical data warehouse, Agile BI

Augmented Analytics Drives Next Wave of AI, Machine Learning, BI

Recent Jobs

Associate Director for Impact and Analytics

Data Scientist: Support NYS Attorney General Investigations

Judiciary Research Manager (Court Executive 2B)

Cyber Security Engineer – P2

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

Wrangling Data in a Big Data World

Leave a Reply Cancel reply

Upcoming Events

Categories

Tags

You Might Be Interested In

Recent Jobs

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

To Drive Analytics Adoption
And manage change