What Catalog Shopping Can Teach Us About Data

What Catalog Shopping Can Teach Us About Data

You know all those product catalogs sitting on your coffee table (or in your recycle bin) and the websites you visit to buy gifts? They hold interesting lessons for how information can be consumed in the big data era.

Like an inventory, a catalog should list everything available for consumption (and nothing that isn’t), but that’s not enough. An Amazon product page, for example, includes pictures, specs, reviews, and recommendations. These bits of information, cumulatively, help the user decide what to buy.

Consuming data also requires rich context. Before embarking on a research project, an analyst needs to understand the shape of the data set, its source, whether it is up to date, who else has used it, and how it was used. To address those requirements, a catalog should provide data samples and statistical profiles, lineage, lists of users and stewards, and tips on how the data should be interpreted.

Yesterday’s data challenge was all about collecting relevant data for analysis and producing relevant reports, but these days many organizations possess the data and computational resources to answer almost any analytical question. But finding the most relevant, trustworthy data sets and metrics can be like finding a limited-edition Darth Vader Pez dispenser for Uncle Jack.

Read Also:
Building a Smart Data Lake While Avoiding the ‘Dump’

A 21st century data catalog should do the following:

Some catalogs may try to be a source of truth about the right table to consult for a given purpose, the right categorization of a given value, or the right way to calculate a given metric. If universally consulted and respected, such prescriptive catalogs, hypothetically, could help everyone within an organization align and bring about an overall reduction in disparities and confusion. In practice, however, prescriptivism poses challenges for large enterprises (for example, when Hawaii is grouped with the other states by the finance department, but lumped in with Puerto Rico and Guam by the logistics team responsible for shipping).

A better approach is to document what people are doing: Who is querying which tables, viewing which reports, or using a particular calculation for a given metric? A data asset or technique used just one time by an intern probably isn’t trustworthy.;

Predictive Analytics Innovation summit San Diego
22 Feb

$200 off with code DATA200

Read Also:
Emerging economies need to harness the power of Big Data
Read Also:
Data is the New Everything
Read Also:
Women in analytics: Katherine Sanborn, Kellogg Company
Big Data Paris 2017
6 Mar

15% off with code BDP17-7WDATA

Read Also:
7 Myths About Blockchain
Read Also:
What is the promise of big data? Computer will be better than humans

Leave a Reply

Your email address will not be published. Required fields are marked *