Blog 8: Statistics Denial Myth #4, Rebranding Predictive Modeling
- by Randy Bartlett
This blog discusses the ambition of rebranding predictive modeling as predictive analytics; and why this will enlarge the coming flood of statistical malfeasance. The prediction problem involves uncertainty and statistics provides us with the tools, language, and thinking for addressing numbers with uncertainty. We have provided a problem-based clarification of statistics in Analytics Magazine. This should help people better identify statistics problems.
Rebranding & Mischaracterizing Predictive Modeling:
The rebranding of predictive modeling as predictive analytics is on its surface no more harmful than selling 'pre-owned cars' instead of 'used cars.' The harm comes when rebranding mischaracterizes statistics and circumvents best practice. As mentioned before, the concern with rebranding is that the next step is to strip away everything not understood by those merely following recipes.
Here is a quote that captures the problematic mischaracterization, 'PREDICTIVE ANALYTICS does NOT require an understanding of “STATISTICS / TRADITIONAL p-value STATISTICS” …. Period !!!!!' This objection spreads rudimentary misunderstandings about statistics. Here is another quote, 'It [predictive modeling] is not a [sub]field of statistics.'
[pullquote cite="W. Edwards Deming" type="right"]The only useful function for a statistician is to make predictions, and thus provide a basis for action.[/pullquote]
First, prediction has always been a subfield of statistics. We need to look no further than the fact that prediction involves uncertainty. Let us recap the four common objectives for statistics models: coefficient estimation, prediction (there it is!), grouping, and ranking. Those new to the subject want to rename and reclassify everything as part of their rediscovery.
Second, note the qualification from the first quote, 'statistics/traditional p-value statistics.' This is like claiming that division does not require an understanding of 'mathematics/traditional addition.'
Here is a third quote, "PA [Predictive Analytics] and DS [Data Science] both contrast with statistics in their emphasis on prediction over causality and their general use of observational in contrast to experimental methods." PA and DS are rebrandings of predictive modeling and statistics, respectively with no change in content. All of the assumptions, thinking, and tools for dealing with uncertainty are statistical.
First, that 'predictive analytics emphasizes prediction over statistics' is just babble. Similarly, we could claim that predictive modeling emphasizes prediction over statistics and sampling emphasizes sampling over statistics too. We could claim that 'Division Scientists' perform more division than Mathematicians. This does not express a new value proposition for predictive analytics. In general, this boils down to a comparison between topical areas like commercial statistics and clinical statistics. This distinction is lost when an applied statistician moves from clinical to commercial.
Second, the same confused type of claim is repeated with the idea that predictive analytics is more about observational data than statistics. Again, this is like claiming that predictive modeling is more about observational data than statistics. There is nothing new in this rebranding that is not in predictive modeling—a subfield of statistics.
Third, statistics places a heavy emphasis on analyzing observational data and it does this in a number of subfields: predictive modeling, DoS (Design of Samples), QC/PC (Quality Control/Process Control), Times Series, EDA (Exploratory Data Analysis), et al. Hence, statistics is more about observational data than predictive analytics. Observational data contains uncertainty.
Close:
Predictive analytics is a rebranding of predictive modeling for promotional purposes. We can be certain that prediction is a statistics problem because it involves numbers with uncertainty. Claiming prediction does not require an understanding of statistics is like claiming that division does not require an understanding of mathematics.
Rebranding can have some benefits if performed thoughtfully. However, we think that there is nothing thoughtful or measured in denying the value proposition of statistics. The downside of rebranding is that important parts can be omitted just because they are not understood by recipe followers. We have noticed that self-professed experts in predictive analytics seldom discuss prediction intervals!? Corrupting prediction modeling will facilitate a flood of statistical malfeasance.
We sure could use Deming, right now. Many of us, who consume or produce data analysis, hang out in the new LinkedIn group: About Data Analysis. Come see us.
[Social9_Share class=”s9-widget-wrapper”]
Randy Bartlett
Statistician/Statistical Data Scientist at Blue Sigma Analytics
Latest posts by Randy Bartlett (see all)
- Blog 9: Statistics Denial Myths #5-6, Mischaracterizing Statistical Significance - 29 September 2015
- Blog 8: Statistics Denial Myth #4, Rebranding Predictive Modeling - 8 September 2015
- Blog 7: Statistics Denial Myth #3, Repackaging Statistics With Straddling Terms - 3 August 2015
Upcoming Events
Strategies for simplifying complex Salesforce data migrations – Free Webinar
27 March 2024
5 PM CET – 6 PM CET
Read MoreCategories
You Might Be Interested In
Who came up with the name Big Data?
27 Jan, 2017Big Data has truly come of age in 2013 when Oxford English Dictionary introduced the term “Big Data” for the …
Why Data Decays so Fast
10 Feb, 2017People change jobs, get promoted and move home. Companies go out of business, expand and relocate. Every one of these …
Blog 9: Statistics Denial Myths #5-6, Mischaracterizing Statistical Significance
29 Sep, 2015Myth #5 builds upon the old confusion around significance testing that comprises this second ‘ancient’ myth (#6). Suppose that you …
Recent Jobs
Do You Want to Share Your Story?
Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.