The data science project lifecycle

The data science project lifecycle

The data science project lifecycle

How does the typical data science project life-cycle look like?

This post looks at practical aspects of implementing data science projects. It also assumes a certain level of maturity in big data (more on big data maturity models in the next post) and data science management within the organization. Therefore the life cycle presented here differs, sometimes significantly from purist definitions of 'science' which emphasize the hypothesis-testing approach. In practice, the typical data science project life-cycle resembles more of an engineering view imposed due to constraints of resources (budget, data and skills availability) and time-to-market considerations.

The CRISP-DM model (CRoss Industry Standard Process for Data Mining) has traditionally defined six steps in the data mining life-cycle. Data science is similar to data mining in several aspects, hence there's some similarity with these steps.

The CRISP model steps are: 1. Business Understanding 2. Data Understanding 3. Data Preparation 4. Modeling 5. Evaluation and 6. Deployment

Read Also:
Generating Values From Big Data Analytics for Your Business in 2017

Given a certain level of maturity in big data and data science expertise within the organization, it is reasonable to assume availability of a library of assets related to data science implementations. Key among these are: 1. Library of business use-cases for big data/ data science applications 2. Data requirements - business use case mapping matrix 3. Minimum data quality requirements (test cases to ensure minimum level of data quality to ensure feasibility)

In most organizations, data science is a fledgling discipline, hence data scientists (except those from actuarial background) are likely to have limited business domain expertise - therefore they need to be paired with business people and those with expertise in understanding the data. This helps data scientists gain or work together on steps 1 and 2 of the CRISM-DM model - i.e. business understanding and data understanding.

The typical data science project then becomes an engineering exercise in terms of a defined framework of steps or phases and exit criteria, which allow making informed decisions on whether to continue projects based on pre-defined criteria, to optimize resource utilization and maximize benefits from the data science project. This also prevents the project from degrading into money-pits due to pursuing nonviable hypotheses and ideas.

Read Also:
The CEO of £1.4 billion software giant Xero says AI will be 'transformational' for finance

The data science life-cycle thus looks somewhat like: 1. Data acquisition 2. Data preparation 3. Hypothesis and modeling 4. Evaluation and Interpretation 5. Deployment 6. Operations 7. Optimization

Data Acquisition - may involve acquiring data from both internal and external sources, including social media or web scraping.

 



Chief Analytics Officer Europe

25
Apr
2017
Chief Analytics Officer Europe

15% off with code 7WDCAO17

Read Also:
Cognition and the future of marketing

Chief Analytics Officer Spring 2017

2
May
2017
Chief Analytics Officer Spring 2017

15% off with code MP15

Read Also:
The CEO of £1.4 billion software giant Xero says AI will be 'transformational' for finance

Big Data and Analytics for Healthcare Philadelphia

17
May
2017
Big Data and Analytics for Healthcare Philadelphia

$200 off with code DATA200

Read Also:
Will Digital Health Data Lead to Better Care?

SMX London

23
May
2017
SMX London

10% off with code 7WDATASMX

Read Also:
The Difference Between Big Data And Deep Data
Read Also:
Big Data and Smart Data: Big Drivers for Smart Decision Making

Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
6 Steps to Effective Data Preparation for Quality Conclusions

Leave a Reply

Your email address will not be published. Required fields are marked *