Driving Data Science Productivity Without Compromising Quality

Driving Data Science Productivity Without Compromising Quality

Driving Data Science Productivity Without Compromising Quality

How will data science teams maintain quality standards in the face of advancing automation? Attend the IBM DataFirst Launch Event on Sep 27 in NYC and learn how to drive greater productivity from your data science teams without compromising the quality of the mission-critical business assets they produce.

Productivity in data science isn’t a matter of output in any quantitative sense. It’s more an issue of the quality of what data scientists produce.

In a data-science context, quality refers to the validity and relevance of the insights that statistical models are able to distill from the data. As I stated recently, the more data you have, the more stories that data scientists can tell with it, though many of those narratives may be entirely (albeit inadvertently) fictitious.

Given the paramount importance of actionable insights, the productivity of data science teams can’t be neatly reduced to throughput or any other quantitative metric. Data scientists can easily pump up their aggregate output along myriad dimensions, such as more sources, more data, more pipeline processes, more variables, more iterations, and more visualizations. But that doesn’t necessarily get them any closer to delivering high-quality analytics for predictive, prescriptive, and other uses.

Read Also:
How the Software-Defined Data Center Helps Companies Effectively Leverage the Cloud

Likewise, you can’t always assume that throwing more data scientists at a problem will boost the quality of their collective output. Different data scientists may rely on different data sources; aggregate, cleanse, and sample them in different ways; incorporate different feature sets into their models, use different algorithmic and visualization approaches; employ different metrics of model fitness and predictive capability, and so on. Lack of standardized approaches and consistent documentation may frustrate efforts by third parties to compare which data scientists’ results, if anybody’s, are most valid in any particular instance of them working on a common problem.

As the number and variety of data scientists at work on a problem grows, they may be surfacing more spurious correlations than causal insights. And if they use different methodologies to produce their various results, it may get more difficult to identify who, if anyone, has delivered the highest-quality insights.

If you throw “citizen data scientists” into the mix, the quality problem can easily deteriorate unless you implement safeguard procedures (to be discussed later in this post). Citizen data scientist refers to the new generation of statistical explorers who lack the traditional academic and work backgrounds of established data scientists. Typically, these newbies are self-taught, self-starting, self-sufficient, and use self-service cloud-based statistical modeling and data engineering tools. They tend to use idiosyncratic methods, which may frustrate subsequent efforts by others to assess the validity of their findings.

Read Also:
Don’t Fall on the Wrong Side of the Digital Divide

 



Chief Analytics Officer Europe

25
Apr
2017
Chief Analytics Officer Europe

15% off with code 7WDCAO17

Read Also:
Brands see visual intelligence as key part of AI strategy

Chief Analytics Officer Spring 2017

2
May
2017
Chief Analytics Officer Spring 2017

15% off with code MP15

Read Also:
How the Hilton, Hyatt, and Marriott can Leverage Data to Compete With Airbnb

Big Data and Analytics for Healthcare Philadelphia

17
May
2017
Big Data and Analytics for Healthcare Philadelphia

$200 off with code DATA200

Read Also:
Big Data software companies battle for mainstream buyers

SMX London

23
May
2017
SMX London

10% off with code 7WDATASMX

Read Also:
10 Online Big Data Courses and Where to Find Them 2016

Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
Brands see visual intelligence as key part of AI strategy

Leave a Reply

Your email address will not be published. Required fields are marked *