Why Data Science Isn’t an Exact Science
- by 7wData
Organizations adopt data science with the goal of getting answers to more types of questions, but those answers are not absolute.
Business professionals have traditionally viewed the world in concrete terms and sometimes even round numbers. That legacy perspective is black and white compared to the shades of gray that data science produces. Instead of producing a single number result such as 40%, the result is probabilistic, combining a level of confidence with a margin of error. (The statistical calculations are far more complex than that, of course.)
While two numbers are arguably twice as complicated as one, confidence and error probabilities help non-technical decisionmakers:
In fact, there are several reasons why data science isn't an exact science, some of which are described below.
"When we're doing data science effectively, we're using statistics to model the real world, and it's not clear that the statistical models we develop accurately describe what's going on in the real world," said Ben Moseley, associate professor of operations research at Carnegie Mellon University's Tepper School of Business. "We might define some probability distribution, but it isn't even clear the world acts according to some probability distribution."
You may or may not have all the data you need to answer a question. Even if you have all the data you need, there may be data quality problems that could cause biased, skewed, or otherwise undesirable outcomes. Data scientists call this "garbage in, garbage out."
According to Gartner, "Poor data quality destroys business value" and costs organizations an average of $15 million per year in losses.
If you lack some of the data you need, then the results will be inaccurate because the data doesn't accurately represent what you're trying to measure. You may be able to get the data from an external source but bear in mind that third-party data may also suffer from quality problems. A current example is COVID-19 data, which is recorded and reported differently by different sources.
"If you don't give me good data, it doesn't matter how much of that data you give me. I'm never going to extract what you want out of it," said Moseley.
It's been said that if one wants better answers, one should ask better questions. Better questions come from data scientists working together with domain experts to frame the problem. Other considerations include assumptions, available resources, constraints, goals, potential risks, potential benefits, success metrics, and the form of the question.
"Sometimes it's unclear what is the right question to ask," said Moseley.
Data science is sometimes viewed as a panacea or magic. It's neither.
"There are significant limitations to data science [and] machine learning," said Moseley. "We take a real-world problem and turn it into a clean mathematical problem, and in that transformation, we lose a lot of information because you have to streamline it somehow to focus on the key aspects of the problem."
A model may work very well in one context and fail miserably in another.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
Strategies for simplifying complex Salesforce data migrations – Free Webinar
27 March 2024
5 PM CET – 6 PM CET
Read MoreCategories
You Might Be Interested In
Data analytics is on trend with fashion houses
12 Apr, 2017Fashion retailers are increasingly turning to data analytics to keep up with the latest trends and client demands. As well …
What it will take for IoT to grow
20 Jul, 2017After I read Brian Bailey’s IoT semiconductor design article, IoT Myth Busting, I thought of Prince’s song 1999, in particular, …
Must-Know Data Strategy Priorities for CIOs
10 Feb, 2023Today’s data strategy revolves around four key initiatives, including data democratization and data orchestration. Data is the essence of any …
Recent Jobs
Do You Want to Share Your Story?
Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.