Automated Predictive Analytics – What Could Possibly Go Wrong?

Summary:  Will Automated Predictive Analytics be a boon to professional data scientists or a dangerous diversion allowing well-meaning, motivated but amateur users try to implement predictive analytics.  More on the conversation started last week about new One-Click Data-In Model-Out platforms.

I have always been very much of the view that data science is best left to data scientists.  But in the research that led up to my article last week “Data Scientists Automated and Unemployed by 2025!” I detailed what I’d found about a new trend and a group of analytic platform developers who are driven to deliver One-Click Data-In Model-Out functionality.  In other words, fully Automated Predictive Analytics.

In that article we used the popular definition:  Automated Predictive Analytics are services that allow a data owner to upload data and rapidly build predictive or descriptive models with a

There is an element here of trying to make the data scientist more efficient by automating the tasks that are least creative.  Much of that is in data cleansing, normalizing, removing skewness, transforming data for specific algorithm requirements, and even running multiple algorithms in parallel to determine champion models.

But the reality is that these companies are pitching directly to non-data scientist with tag lines like “Data Science Isn’t Rocket Science”.  And the reason they are headed in this direction is that Gartner predicts that this ‘Citizen Data Scientist’ market will grow 5X more quickly than the true data scientist market.

It’s also a compelling motive that while most data scientist have settled in on the one or two advanced analytic platforms or tools they prefer to use that this new expanding market does not come with that baggage.  To sell here will not require displacing SAS or SPSS or any of the other heavy hitters since those tools don’t meet the needs of these new users.

What Could Possible Go Wrong

With these concerns in mind I set out to catalogue all the things that can go wrong when excited amateurs use sophisticated tools to produce predictive analytics.  I also went back and talked in greater depth to a few of the five One-Click companies I listed in the earlier article to see if they acknowledged or were addressing these concerns. The list of the five companies is not intended to be exhaustive and neither were my repeat interviews.

A fast way to sub-optimize a model is to start out assuming the data is homogenous.  Data Science 101 says explore the data visually (to avoid the rooky mistakes illustrated by Anscomb’s Quartet) and lead with clustering and segmentation so you don’t mistake whole new markets for simple outliers.

Some of the One-Clicks have recognized this.  DataRPM leads with segmentation and clustering.  PurePredictive which just launched says this is on their development roadmap.

Limiting this to ensuring that all the data in a column is consistently numeric (no extraneous symbols such as commas) or categorical, (ensuring a limited and consistent use of alphas), we may actually be better off with an automated system that is less likely to make human errors.  Like spell check on your phone however, you’ll want to inspect the assumptions especially when automatically trying to standardize alphas.

So long as we are talking about things like removing skewness, or normalizing data required for specific algorithms (e.g. normalizing for neural nets) once again, I think I would be likely to accept an automated system.

The One-Click folks are smart enough to impute missing data only where it’s needed by the algorithm (e.g. not for trees).  Beyond that there are a variety of techniques at work that vary from platform to platform that a reviewer would want to pay attention to.

Techniques range from simple median value replacement to AI-based logic.  Also you would want to carefully examine how much missing data triggers a rule to simply ignore the feature.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

What is the Impact of Artificial Intelligence on the Real Estate Industry

23 Oct, 2022

The real estate industry is one of the many industries being disrupted by artificial intelligence (AI). From chatbots to predictive …

Read more

Military Lags in Exploiting Artificial Intelligence

10 Feb, 2017

Hyper-intelligence machines have permeated every layer of modern society — from smartphones to self-driving cars. Silicon Valley continues to pour …

Read more

Why it’s important to operationalize big data into daily tasks

9 Jul, 2020

Big data analytics can do more than just deliver reports to decision makers. It can help with a company’s day-to-day …

Read more

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.