Data science without statistics is possible, even desirable

Data science without statistics is possible

The purpose of this article is to clarify a few misconceptions about data and statistical science.

I will start with a controversial statement: data science barely uses statistical science and techniques. The truth is actually more nuanced, as explained below.

But the new statistical science in question is not regarded as statistics, by many statisticians. I don't know how to call it, "new statistical science" is a misnomer, because it is not all that novel. And it is regarded by statisticians as dirty data processing, not elegant statistics.

It contains topics such as

I have sometimes used the word rebel statistics  to describe these methods.

While I consider these topics to be statistical science (I contributed to many of them myself, and my background is in computational statistics), most statisticians I talked to do not see it as statistical science. And calling this stuff statistics only creates confusion, especially for hiring managers.

Some people call it statistical learning. One of the precursors of this type of methods is Trevor Hastie who wrote one of the first data science books, called The Elements of Statistical Learning.

2. Data science uses a bit of old statistical science

Including the following topics, which curiously enough, are not found in standard statistical textbooks:

These techniques can be summarized in one page, and time permitting, I will write that page and call it "statistics cheat sheet for data scientists". Interestingly, from a typical 600-pages textbook on statistics, about 20 pages are relevant to data science, and these 20 pages can be compressed in 0.25 page. For instance, I believe that you can explain the concept of random variable and distribution (at least what you need to understand to practice data science) in about 4 lines, rather than 150 pages. The idea is to explain it in plain English with a few examples, and defining distribution as the expected (based on model) or  limit of a frequency distribution (histogram).

Funny fact: some of these classic stats texbooks still feature tables of statistical distributions in an appendix. Who still use such tables for computations? Not a data scientist, for sure. Most programming languages offer libraries for these computations, and you can even code it yourself in a couple of lines of code. A book such as numerical recipes in C++ can prove useful, as it provides code for many statistical functions; see also our source code section on DSC, where I plan to add more modern implementations of statistical techniques, some even available as Excel formulas.

In particular, OLS (ordinary least squares) , Monte-Carlo techniques, mathematical optimization, the simplex algorithm, inventory and pricing management models.

 

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

One thought on “Data science without statistics is possible, even desirable

  1. This blog was debunked … thoroughly. It is ridiculous. E.g., it fails to name a single technique for data analysis, which is not statistics. The author makes glaring errors in regards to the nature of statistics and the techniques.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

Big Data Explained in Less Than 2 Minutes

30 Oct, 2016

There are some things that are so big that they have implications for everyone, whether we want them to or …

Read more

Your Data Is Biased, Here’s Why

23 Oct, 2017

Ransomware is one of the fastest growing types of malware, and new breeds that escalate quickly ar Biased data can …

Read more

Top 5 benefits of real-time business intelligence for utilities

6 Apr, 2017

Real-time business intelligence solutions are expected to be game changers for asset-intensive and field-force-driven enterprises, such as utilities. These solutions …

Read more

Recent Jobs

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

D365 Business Analyst

South Bend, IN, USA

22 Apr, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.