Hypothesis driven thinking in data science

Hypothesis driven thinking in data science

“In God we trust! All others bring data,” is a popular quote attributed to William Deming, a physicist and a pioneer in quality management. By applying statistical methods from the natural sciences to manufacturing in the first half of the 20 century, he was able to drastically increase the industry’s efficiency. In a way, Deming was one of the predecessors of modern data scientists by applying his regular methodology to business.

Since data science is a comparably new field, there are probably more definitions of data science than there are data scientists. In my opinion, data science is not about data, but it is about a certain way of thinking about data. Today, I will give an example of what one of the most frequent tasks of data scientists actually is, and why you don’t need to be a data scientist to follow the essence of his work in a project.

The process of converting data into meaningful information is nowadays called Business Intelligence. It helps businesses to report data and understand ‘what’, ‘where’, ‘when’ and ‘how much’.

Exploiting data in decision making using Key Performance Indicators (KPI) and to predict their future values is a more sophisticated use of data than reporting it. Such an application of data in the business context is usually termed (business) data analytics. It guides businesses on how to go further and understand the ‘what next’.

Fig. 1: Pyramid of Data Science, Data Analytics and Business Intelligence with their "added values".

However, at no point have we really addressed the ‘why’.  From my point of view, understanding – meaning answering the ‘why’ by finding reasons and extracting knowledge using data – is an entirely different process and its owners are data scientists.

From my point of view, data scientists try to understand data instead of just crunching it.

How do they extract knowledge from data?

A brief note about knowledge. Anything we know (for example that the Earth is round instead of flat) is a result of a series of findings that disproved previous “knowledge”. But if anything that we think to be true could be falsified, we actually only know negative truths: Things that definitely are not the case. Essentially, everything else is just wishful thinking and has not yet been disproven.

This is disappointing! And rather academic. To convince a client, we need to transform our knowledge about things that are not the case into something actionable, of value. Because of that, data science is not about the data, but about falsifying business hypotheses using data. In the end, the challenge is to formulate the hypotheses in such a way that disproving them actually leads to valuable conclusions.

Unfortunately, this is a very dangerous process and hypothesis disproving skills could be the most important skills of a data scientist.

To show you how that works let’s go through a full-scale hypothesis test.

There is a group of four datasets designed by Francis Anscombe in 1973 that could lead a data analyst into an ugly trap (Fig. 2). According to statistics, all four datasets are described by almost the same straight line, but in three of the four cases the line is horribly wrong. So, without the plots: How do we know?

Fig. 2: Anscombe's dataset with best fitting straight lines.

 

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

Why You Need a Data Catalog and How To Select One

27 Jan, 2019

In a digital world where data lives everywhere, enterprise data catalogs are an invaluable asset in your information architecture. Over …

Read more

The Future of Business Intelligence

8 Jun, 2022

The future of business intelligence is something that many executives and decision makers are closely following. Is it a tech …

Read more

Tips for reading Big Data results correctly

1 Oct, 2016

MIT healthcare economist Joseph Doyle spends his time measuring the returns on healthcare spending and outcomes with the goals of …

Read more

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.