Driving Value with Data Science

Driving Value with Data Science

Driving Value with Data Science

Fighting fraud, reducing customer churn, improving the bottom line –  these are just a few of the promises of data science. Today, we have more data to work with than ever before, thanks to new data-generating technologies like smart meters, vehicle telemetry, RFID, and intelligent sensors.

But with all that data, are we driving equivalentvalue? Many data scientists say theyspendmost of their time as “data janitors” combining data from many sources, dealing with complex formats, and cleaning up dirty data.  

Data scientists also say they spend a lot of time serving as “plumbers” – handling DevOps and managing the analytics infrastructure. Time devoted to data wrangling and DevOps is a dead loss; it reduces the amount of time data scientists can spend delivering real value to clients.

Small data tools.Data analytics software introduced before 2012 runs on single machines only; this includes most commercial software for analytics as well as open source R and Python. When the volume of data exceeds the capacity of the computer, runtime performance degrades or jobs fail. Data scientists working with these tools must invest time in workarounds, such as sampling, filtering or aggregating. In addition to taking time, these techniques reduce the amount of data available for analysis, which affects quality.

Read Also:
Why Marketers Need to Make Marketing Data a Priority

Complex and diverse data sources. Organizations use a wide variety of data management platforms to manage the flood of Big Data, including relational databases; Hadoop; NoSQL data stores; cloud storage; and many others. These platforms are often “siloed” from one another. The data in those platforms can be structured, semi-structured and unstructured; static and streaming; cleansed and uncleansed. Legacy analytic software is not designed to handle complex data; the user must use other tools, such as Hive or Pig, or write custom code.

Single-threaded software.Legacy software scales up, not out. If you want more computing power, you’ll have to buy a bigger machine. In addition to limiting the amount of data you can analyze, it also means that tasks run serially, one after the other. For a complex task, that can take days or even weeks.

Complex infrastructure. Jeff Magnusson, Director of Algorithms Platform at online retailer, Stitch Fixnotesthat data science teams typically include groups of engineers who spend most of their time keeping the infrastructure running. Data science teams often manage their platforms because clients have urgent needs, the technology is increasingly sophisticated, and corporate IT budgets are lean.

Read Also:
The New Data Scientist Venn Diagram

It doesn’t make sense to hire highly paid employees with skills in advanced analytics, then put them to work cleaning up data and managing clusters. Visionary data scientists seek tools and platforms that arescalable;interoperablewith Big Data platforms;distributed; andelastic.

Scalability.Some academicsquestionthe value of working with large datasets. For data scientists, however, the question is moot; you can’t escape using large datasets even if you agree with the academics. Why? Because the data you need for your analysis comes from a growing universe of data; and, if you build a predictive model, your organization will need to score large volumes of data. You don’t have a choice; large datasets are a fact of life, and your tools must reflect this reality.

Integrated with Big Data platforms.As a data scientist, you may have little or no control over the structure of the data you need to analyze or the platforms your organization uses to manage data.

 



Read Also:
Henri Eliot: Is big data a governance issue?

Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
The human side of the data revolution

AI Paris

6
Jun
2017
AI Paris

20% off with code AIP17-7WDATA-20

Read Also:
Semantic Graph Databases: A worthy successor to relational databases

Chief Data Officer Summit San Francisco

7
Jun
2017
Chief Data Officer Summit San Francisco

$200 off with code DATA200

Read Also:
Manage Deploys MemSQL for Real-Time Analytics

Customer Analytics Innovation Summit Chicago

7
Jun
2017
Customer Analytics Innovation Summit Chicago

$200 off with code DATA200

Read Also:
Blockchain, IoT, Artificial Intelligence Poised to Shake Up Healthcare

HR & Workforce Analytics Innovation Summit 2017 London

12
Jun
2017
HR & Workforce Analytics Innovation Summit 2017 London

$200 off with code DATA200

Read Also:
Using Machine Learning Algorithms to Improve Your Business Workflows

Leave a Reply

Your email address will not be published. Required fields are marked *