Automated Predictive Analytics – What Could Possibly Go Wrong?

Automated Predictive Analytics – What Could Possibly Go Wrong?

Summary:  Will Automated predictive analytics be a boon to professional data scientists or a dangerous diversion allowing well-meaning, motivated but amateur users try to implement predictive analytics.  More on the conversation started last week about new One-Click Data-In Model-Out platforms.

I have always been very much of the view that data science is best left to data scientists.  But in the research that led up to my article last week “Data Scientists Automated and Unemployed by 2025!” I detailed what I’d found about a new trend and a group of analytic platform developers who are driven to deliver One-Click Data-In Model-Out functionality.  In other words, fully Automated Predictive Analytics.

In that article we used the popular definition:  Automated Predictive Analytics are services that allow a data owner to upload data and rapidly build predictive or descriptive models with a

There is an element here of trying to make the data scientist more efficient by automating the tasks that are least creative.  Much of that is in data cleansing, normalizing, removing skewness, transforming data for specific algorithm requirements, and even running multiple algorithms in parallel to determine champion models.

But the reality is that these companies are pitching directly to non-data scientist with tag lines like “data science Isn’t Rocket Science”.  And the reason they are headed in this direction is that Gartner predicts that this ‘Citizen Data Scientist’ market will grow 5X more quickly than the true data scientist market.

Read Also:
Do This One Thing Before You Write Your Data Monetization Plan

It’s also a compelling motive that while most data scientist have settled in on the one or two advanced analytic platforms or tools they prefer to use that this new expanding market does not come with that baggage.  To sell here will not require displacing SAS or SPSS or any of the other heavy hitters since those tools don’t meet the needs of these new users.

What Could Possible Go Wrong

With these concerns in mind I set out to catalogue all the things that can go wrong when excited amateurs use sophisticated tools to produce predictive analytics.  I also went back and talked in greater depth to a few of the five One-Click companies I listed in the earlier article to see if they acknowledged or were addressing these concerns. The list of the five companies is not intended to be exhaustive and neither were my repeat interviews.

A fast way to sub-optimize a model is to start out assuming the data is homogenous.  Data Science 101 says explore the data visually (to avoid the rooky mistakes illustrated by Anscomb’s Quartet) and lead with clustering and segmentation so you don’t mistake whole new markets for simple outliers.

Read Also:
How technology advancements contribute to the democratization of data

Some of the One-Clicks have recognized this.  DataRPM leads with segmentation and clustering.  PurePredictive which just launched says this is on their development roadmap.

Limiting this to ensuring that all the data in a column is consistently numeric (no extraneous symbols such as commas) or categorical, (ensuring a limited and consistent use of alphas), we may actually be better off with an automated system that is less likely to make human errors.  Like spell check on your phone however, you’ll want to inspect the assumptions especially when automatically trying to standardize alphas.

So long as we are talking about things like removing skewness, or normalizing data required for specific algorithms (e.g. normalizing for neural nets) once again, I think I would be likely to accept an automated system.

The One-Click folks are smart enough to impute missing data only where it’s needed by the algorithm (e.g. not for trees).  Beyond that there are a variety of techniques at work that vary from platform to platform that a reviewer would want to pay attention to.

Read Also:
4 tactics that put data ahead of drama when making IT procurement decisions

Techniques range from simple median value replacement to AI-based logic.  Also you would want to carefully examine how much missing data triggers a rule to simply ignore the feature.



Sentiment Analysis Symposium

27
Jun
2017
Sentiment Analysis Symposium

15% off with code 7WDATA

Read Also:
Fighting financial crimes and money laundering with graph data

Data Analytics and Behavioural Science Applied to Retail and Consumer Markets

28
Jun
2017
Data Analytics and Behavioural Science Applied to Retail and Consumer Markets

15% off with code 7WDATA

Read Also:
Why quality data is critical for small business

AI, Machine Learning and Sentiment Analysis Applied to Finance

28
Jun
2017
AI, Machine Learning and Sentiment Analysis Applied to Finance

15% off with code 7WDATA

Read Also:
Do This One Thing Before You Write Your Data Monetization Plan

Real Business Intelligence

11
Jul
2017
Real Business Intelligence

25% off with code RBIYM01

Read Also:
Why quality data is critical for small business

Advanced Analytics Forum

20
Sep
2017
Advanced Analytics Forum

15% off with code Discount15

Read Also:
4 tactics that put data ahead of drama when making IT procurement decisions

Leave a Reply

Your email address will not be published. Required fields are marked *