Data Science Platforms: What are they? And why are they important?

Data Science Platforms: What are they? And why are they important?

Data Science Platforms: What are they? And why are they important?

As more companies recognize the need for a data science platform, more vendors are claiming they have one. Increasingly, we see companies describing their product as a “data science platform” without describing the features that make platforms so valuable. So we wanted to share our vision for the core capabilities a platform should have in order for it to be valuable to data science teams.

We see “the data science lifecycle” spanning three phases. Each phase has distinct demands that motivate capabilities for a data science platform:

To some degree, all data science projects go through these phases.

We’ll discuss these four lenses, describing the challenges involved in each, and what capabilities a good data science platform should provide.

Quantitative research starts with exploring the data to understand what you have. This might mean plotting data in different ways, examining different features, looking at the values of different variables, etc.

Ideation and exploration can be time consuming. The data sets can be large and unwieldy, or you may want to try new packages or tools. If you’re working on a team, unless you have ways of seeing work others have already done, you might be redoing work. Other people may have already developed insights, created clean data sets, or determined which features are useful and which are not.

Read Also:
The Power of Data and Collaboration to Improve Traffic Safety

Through the process of exploring data, researchers formulate ideas they want to test. At this point, research often shifts from ad hoc work in notebooks to more hardened, batch scripts. People run an experiment, review the results, and make changes based on what they’ve learned.

This phase can be slow when experiments are computationally intensive (e.g., model training tasks). This is also where the “science” part of data science can be especially important: tracking variations in your experiments, ensuring past results are reproducible, getting feedback through a peer review process.

Data science work is only valuable insofar as it creates some impact on business outcomes. That means the work must be operationalized or productionized somehow, i.e., it must be integrated into business processes or decision-making processes.

 



Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
How Machine Learning and Big Data Drive the Bottom Line
Read Also:
Huawei readies Big Data solution for Europe

AI Paris

6
Jun
2017
AI Paris

20% off with code AIP17-7WDATA-20

Read Also:
Machine Learning Templates with SQL Server 2016 R Services

Chief Data Officer Summit San Francisco

7
Jun
2017
Chief Data Officer Summit San Francisco

$200 off with code DATA200

Read Also:
Cloud transition is critical to digital transformation

Customer Analytics Innovation Summit Chicago

7
Jun
2017
Customer Analytics Innovation Summit Chicago

$200 off with code DATA200

Read Also:
Data Monetization: Making Data Work for You

Big Data and Analytics Marketing Summit London

12
Jun
2017
Big Data and Analytics Marketing Summit London

$200 off with code DATA200

Read Also:
The 3 Big Data Innovations That Need To Happen In Pharmaceuticals

Leave a Reply

Your email address will not be published. Required fields are marked *