It Seems Like Anyone Can Be a Data Scientist… but Is It True?

It Seems Like Anyone Can Be a Data Scientist... but Is It True?

Yes, one can learn R and Hadoop and “claim” to be a data scientist, but that’s far from the truth. By comparison, one can also take a few medical classes and claim to be a doctor or watch a few courtroom TV shows and claim to be a lawyer. The difference is that the disciplines of medicine and law are “professionalized.” As a result, they are able to guard their gates by setting standards on who can call themselves a “doctor” or a “lawyer.” In data science, we cannot do that as of yet.

Insofar as R and Hadoop, they’re just part of the data science toolkit. They don’t constitute “data science” any more than a scalpel constitutes “surgery.” In the same way that physics relies upon mathematics, data science relies upon statistical tools for handling large and small data sets, structured and unstructured data, etc. But the mathematics of physics is not a substitute for scientific thinking, analysis, approach or method—and neither are Hadoop and R substitutes for understanding behaviour in data.

Statistics, specifically, is concerned largely with methods for testing hypotheses using data; thus, before one can constructively use Hadoop or R, one needs to know Statistics and know it well. Because, unlike statistics—which is concerned largely with testing the hypotheses and stops there—data science focuses on the implications of systematic departures from hypotheses (as evidenced by statistical tests) and the bigger conclusions we can make as a result of those departures.

Moreover, apart from data science requiring a cumulative knowledge of numerous tools or sub-disciplines, like statistics, R, Hadoop, etc., one must be able to bring those tools to bear in answering important business questions and achieving business outcomes, neither of which initiative directly follows from knowledge of the tools. That ability, skill, experience or talent, is what’s brought to the table by the data scientist, enabling him or her to justifiably call himself or herself, a “data scientist.”

This leads me to believe that the real question here is, “Can anyone BE a data scientist?” And to that I would say no, not at all, for the very reasons I just mentioned. In my experience, not even your top CS or STEM majors from a top school can easily become good data scientists, without additional training in it, and some personal factors. Apart from its multidisciplinary nature, data science requires a deep love of the divergence between observed reality in data and the prediction of mathematical models. To do that, one needs something more than just a mastery of tools. One needs a love for imperfection.

I’ve been in this field almost 20 years, from back before the term “data science” existed, so I’ve seen many things. In fact I believe data science excellence requires a number of years in actually applying it before one can truly understand data, how it behaves, how different models work, backwards and forwards, etc. Yet, most importantly, excellence requires making mistakes and understanding mistakes, along with appreciating the variations between observed and predicted reality. Thus, I affectionately call data science the science for imperfect people, like myself.

I say that half-jokingly. In truth, I believe all good science is for imperfect people—people who become curious, not angry, when they see imperfection and variation. STEM majors that cannot stand imperfection and variation will never make good scientists or good data scientists, much like bigots cannot make for good neighbours. Why? Because the world we live in is imperfect and variable and its beauty lies in that imperfection and variability.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

5 Big Data Sources for Improving Data Quality and Business Analytics

5 Nov, 2017

Information is power – especially in the world of e-commerce. You can make a much larger impact and save money …

Read more

Using Analytics to Drive Business Advantage

25 May, 2016

Dave Sheluga of Ardent Mills (Director- Consumer Insights), Jen Randle of Whirlpool (Global Director-Innovation), Krishnan Saranathan of United Airlines (Managing …

Read more

Data Stewardship Unleashed: Empowering Your Organizations Data Journey

20 Feb, 2024

Unlock the power of data stewardship! Empower your organization’s data journey with best practices and cutting-edge tools.

Read more

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.