The “problem-solver” approach to data preparation

The “problem-solver” approach to data preparation

The “problem-solver” approach to data preparation

In many environments, the maturity of your reporting and business analytics functions depends on how effective you are at managing data before it’s time to analyze it. Traditional environments relied on a provisioning effort to conduct data preparation for analytics. After extracting data from source systems, the data landed at a staging area for cleansing, standardization and reorganization before loading it in a data warehouse.

Recently, there has been signification innovation in the evolution of end-user discovery and analysis tools. Often, these systems allow the analyst to bypass the traditional data warehouse by accessing the source data sets directly. This is putting more data – and analysis of that data – in the hands of more people. This encourages “undirected analysis,” which doesn’t pose any significant problems; the analysts are free to point their tools at any (or all!) data sets, with the hope of identifying some nugget of actionable knowledge that can be exploited.

Read Also:
The Future of Health and Human Services Data Modeling (Part 1)

However, it would be naïve to presume that many organizations are willing to allow a significant amount of “data-crunching” time to be spent on purely undirected discovery. Rather, data scientists have specific directions to solve particular types of business problems, such as analyzing: Logistics and facets of the supply chain to optimize the delivery channels.

Different challenges have different data needs, but if the analysts need to use data from the original sources, it’s worth considering an alternate approach to the conventional means of data preparation. The data warehouse approach balances two key goals: organized data inclusion (a large amount of data is integrated into a single data platform), and objective presentation (data is managed in an abstract data model specifically suited for querying and reporting).

A new approach to data preparation for analytics Does the data warehouse approach work in more modern, “built-to-suit” analytics? Maybe not, especially if data scientists go directly to the data – bypassing the data warehouse altogether. For data scientists, armed with analytics at their fingertips, let’s consider a rational, five-step approach to problem-solving. Clarify the question you want to answer. Identify the information necessary to answer the question.

Read Also:
Going with the stream: unbounded data processing with Apache Flink

Determine what information is available and what is not available. Acquire the information that is not available. In this process, steps 2, 3, and 4 all deal with data assessment and acquisition – but in a way that is parametrically opposed to the data warehouse approach. First, the warehouse’s data inclusion is pre-defined, which means that the data that is not available at step 3 may not be immediately accessible from the warehouse in step 4.



Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
Data Scientists: The talent crunch (that isnt?), FOMO and Spanish silver

AI Paris

6
Jun
2017
AI Paris

20% off with code AIP17-7WDATA-20

Read Also:
How Airbnb, Huawei, And Microsoft Are Using AI and Machine Learning

Chief Data Officer Summit San Francisco

7
Jun
2017
Chief Data Officer Summit San Francisco

$200 off with code DATA200

Read Also:
The Missing Link in Supply Chain Management: Decision-Grade Channel Data
Read Also:
How To Make A Bad Data-Driven Decision In Three Easy Steps

Customer Analytics Innovation Summit Chicago

7
Jun
2017
Customer Analytics Innovation Summit Chicago

$200 off with code DATA200

Read Also:
Financial Firms Embrace Predictive Analytics

HR & Workforce Analytics Innovation Summit 2017 London

12
Jun
2017
HR & Workforce Analytics Innovation Summit 2017 London

$200 off with code DATA200

Read Also:
Financial Firms Embrace Predictive Analytics

Leave a Reply

Your email address will not be published. Required fields are marked *