Driving Data Science Productivity Without Compromising Quality

Driving Data Science Productivity Without Compromising Quality

Driving Data Science Productivity Without Compromising Quality

How will data science teams maintain quality standards in the face of advancing automation? Attend the IBM DataFirst Launch Event on Sep 27 in NYC and learn how to drive greater productivity from your data science teams without compromising the quality of the mission-critical business assets they produce.

Productivity in data science isn’t a matter of output in any quantitative sense. It’s more an issue of the quality of what data scientists produce.

In a data-science context, quality refers to the validity and relevance of the insights that statistical models are able to distill from the data. As I stated recently, the more data you have, the more stories that data scientists can tell with it, though many of those narratives may be entirely (albeit inadvertently) fictitious.

Given the paramount importance of actionable insights, the productivity of data science teams can’t be neatly reduced to throughput or any other quantitative metric. Data scientists can easily pump up their aggregate output along myriad dimensions, such as more sources, more data, more pipeline processes, more variables, more iterations, and more visualizations. But that doesn’t necessarily get them any closer to delivering high-quality analytics for predictive, prescriptive, and other uses.

Read Also:
From Big Data to Artificial Intelligence: The Next Digital Disruption

Likewise, you can’t always assume that throwing more data scientists at a problem will boost the quality of their collective output. Different data scientists may rely on different data sources; aggregate, cleanse, and sample them in different ways; incorporate different feature sets into their models, use different algorithmic and visualization approaches; employ different metrics of model fitness and predictive capability, and so on. Lack of standardized approaches and consistent documentation may frustrate efforts by third parties to compare which data scientists’ results, if anybody’s, are most valid in any particular instance of them working on a common problem.

As the number and variety of data scientists at work on a problem grows, they may be surfacing more spurious correlations than causal insights. And if they use different methodologies to produce their various results, it may get more difficult to identify who, if anyone, has delivered the highest-quality insights.

If you throw “citizen data scientists” into the mix, the quality problem can easily deteriorate unless you implement safeguard procedures (to be discussed later in this post). Citizen data scientist refers to the new generation of statistical explorers who lack the traditional academic and work backgrounds of established data scientists. Typically, these newbies are self-taught, self-starting, self-sufficient, and use self-service cloud-based statistical modeling and data engineering tools. They tend to use idiosyncratic methods, which may frustrate subsequent efforts by others to assess the validity of their findings.

Read Also:
The Data-Driven Case for Vacation

 



Sentiment Analysis Symposium

27
Jun
2017
Sentiment Analysis Symposium

15% off with code 7WDATA

Read Also:
5 Ways Artificial Intelligence Is Shaping the Future of Ecommerce

Data Analytics and Behavioural Science Applied to Retail and Consumer Markets

28
Jun
2017
Data Analytics and Behavioural Science Applied to Retail and Consumer Markets

15% off with code 7WDATA

Read Also:
Data science underlies everything the enterprise now does

AI, Machine Learning and Sentiment Analysis Applied to Finance

28
Jun
2017
AI, Machine Learning and Sentiment Analysis Applied to Finance

15% off with code 7WDATA

Read Also:
What is a Data Management Platform or DMP?

Real Business Intelligence

11
Jul
2017
Real Business Intelligence

25% off with code RBIYM01

Read Also:
The art of designing data flow on a free-form canvas

Advanced Analytics Forum

20
Sep
2017
Advanced Analytics Forum

15% off with code Discount15

Read Also:
The art of designing data flow on a free-form canvas

Leave a Reply

Your email address will not be published. Required fields are marked *