Hypothesis driven thinking in data science
- by 7wData
“In God we trust! All others bring data,” is a popular quote attributed to William Deming, a physicist and a pioneer in quality management. By applying statistical methods from the natural sciences to manufacturing in the first half of the 20 century, he was able to drastically increase the industry’s efficiency. In a way, Deming was one of the predecessors of modern data scientists by applying his regular methodology to business.
Since data science is a comparably new field, there are probably more definitions of data science than there are data scientists. In my opinion, data science is not about data, but it is about a certain way of thinking about data. Today, I will give an example of what one of the most frequent tasks of data scientists actually is, and why you don’t need to be a data scientist to follow the essence of his work in a project.
The process of converting data into meaningful information is nowadays called Business Intelligence. It helps businesses to report data and understand ‘what’, ‘where’, ‘when’ and ‘how much’.
Exploiting data in decision making using Key Performance Indicators (KPI) and to predict their future values is a more sophisticated use of data than reporting it. Such an application of data in the business context is usually termed (business) data analytics. It guides businesses on how to go further and understand the ‘what next’.
Fig. 1: Pyramid of Data Science, Data Analytics and Business Intelligence with their "added values".
However, at no point have we really addressed the ‘why’. From my point of view, understanding – meaning answering the ‘why’ by finding reasons and extracting knowledge using data – is an entirely different process and its owners are data scientists.
From my point of view, data scientists try to understand data instead of just crunching it.
How do they extract knowledge from data?
A brief note about knowledge. Anything we know (for example that the Earth is round instead of flat) is a result of a series of findings that disproved previous “knowledge”. But if anything that we think to be true could be falsified, we actually only know negative truths: Things that definitely are not the case. Essentially, everything else is just wishful thinking and has not yet been disproven.
This is disappointing! And rather academic. To convince a client, we need to transform our knowledge about things that are not the case into something actionable, of value. Because of that, data science is not about the data, but about falsifying business hypotheses using data. In the end, the challenge is to formulate the hypotheses in such a way that disproving them actually leads to valuable conclusions.
Unfortunately, this is a very dangerous process and hypothesis disproving skills could be the most important skills of a data scientist.
To show you how that works let’s go through a full-scale hypothesis test.
There is a group of four datasets designed by Francis Anscombe in 1973 that could lead a data analyst into an ugly trap (Fig. 2). According to statistics, all four datasets are described by almost the same straight line, but in three of the four cases the line is horribly wrong. So, without the plots: How do we know?
Fig. 2: Anscombe's dataset with best fitting straight lines.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
Strategies for simplifying complex Salesforce data migrations – Free Webinar
27 March 2024
5 PM CET – 6 PM CET
Read MoreYou Might Be Interested In
Why You Need a Data Catalog and How To Select One
27 Jan, 2019In a digital world where data lives everywhere, enterprise data catalogs are an invaluable asset in your information architecture. Over …
The Future of Business Intelligence
8 Jun, 2022The future of business intelligence is something that many executives and decision makers are closely following. Is it a tech …
Tips for reading Big Data results correctly
1 Oct, 2016MIT healthcare economist Joseph Doyle spends his time measuring the returns on healthcare spending and outcomes with the goals of …
Recent Jobs
Do You Want to Share Your Story?
Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.