Driving Value with Data Science

Driving Value with Data Science

Driving Value with Data Science

Fighting fraud, reducing customer churn, improving the bottom line –  these are just a few of the promises of data science. Today, we have more data to work with than ever before, thanks to new data-generating technologies like smart meters, vehicle telemetry, RFID, and intelligent sensors.

But with all that data, are we driving equivalentvalue? Many data scientists say theyspendmost of their time as “data janitors” combining data from many sources, dealing with complex formats, and cleaning up dirty data.  

Data scientists also say they spend a lot of time serving as “plumbers” – handling DevOps and managing the analytics infrastructure. Time devoted to data wrangling and DevOps is a dead loss; it reduces the amount of time data scientists can spend delivering real value to clients.

Small data tools.Data analytics software introduced before 2012 runs on single machines only; this includes most commercial software for analytics as well as open source R and Python. When the volume of data exceeds the capacity of the computer, runtime performance degrades or jobs fail. Data scientists working with these tools must invest time in workarounds, such as sampling, filtering or aggregating. In addition to taking time, these techniques reduce the amount of data available for analysis, which affects quality.

Read Also:
Manufacturing Business Intelligence

Complex and diverse data sources. Organizations use a wide variety of data management platforms to manage the flood of Big Data, including relational databases; Hadoop; NoSQL data stores; cloud storage; and many others. These platforms are often “siloed” from one another. The data in those platforms can be structured, semi-structured and unstructured; static and streaming; cleansed and uncleansed. Legacy analytic software is not designed to handle complex data; the user must use other tools, such as Hive or Pig, or write custom code.

Single-threaded software.Legacy software scales up, not out. If you want more computing power, you’ll have to buy a bigger machine. In addition to limiting the amount of data you can analyze, it also means that tasks run serially, one after the other. For a complex task, that can take days or even weeks.

Complex infrastructure. Jeff Magnusson, Director of Algorithms Platform at online retailer, Stitch Fixnotesthat data science teams typically include groups of engineers who spend most of their time keeping the infrastructure running. Data science teams often manage their platforms because clients have urgent needs, the technology is increasingly sophisticated, and corporate IT budgets are lean.

Read Also:
Wearables Data Support Proactive Treatment in Senior Care

It doesn’t make sense to hire highly paid employees with skills in advanced analytics, then put them to work cleaning up data and managing clusters. Visionary data scientists seek tools and platforms that arescalable;interoperablewith Big Data platforms;distributed; andelastic.

Scalability.Some academicsquestionthe value of working with large datasets. For data scientists, however, the question is moot; you can’t escape using large datasets even if you agree with the academics. Why? Because the data you need for your analysis comes from a growing universe of data; and, if you build a predictive model, your organization will need to score large volumes of data. You don’t have a choice; large datasets are a fact of life, and your tools must reflect this reality.

Integrated with Big Data platforms.As a data scientist, you may have little or no control over the structure of the data you need to analyze or the platforms your organization uses to manage data.

 



Read Also:
Why AI and machine learning are so hard, Facebook and Google weigh in

Big Data Innovation Summit London

30
Mar
2017
Big Data Innovation Summit London

$200 off with code DATA200

Read Also:
Data Lake: A more Technical Point of View

Data Innovation Summit 2017

30
Mar
2017
Data Innovation Summit 2017

30% off with code 7wData

Read Also:
Manufacturing Business Intelligence

Enterprise Data World 2017

2
Apr
2017
Enterprise Data World 2017

$200 off with code 7WDATA

Read Also:
Seven ways to be data-driven off a cliff

Data Visualisation Summit San Francisco

19
Apr
2017
Data Visualisation Summit San Francisco

$200 off with code DATA200

Read Also:
The data science project lifecycle

Chief Analytics Officer Europe

25
Apr
2017
Chief Analytics Officer Europe

15% off with code 7WDCAO17

Read Also:
4 trends in security data science for 2017

Leave a Reply

Your email address will not be published. Required fields are marked *