The data science project lifecycle

The data science project lifecycle

The data science project lifecycle

How does the typical data science project life-cycle look like?

This post looks at practical aspects of implementing data science projects. It also assumes a certain level of maturity in big data (more on big data maturity models in the next post) and data science management within the organization. Therefore the life cycle presented here differs, sometimes significantly from purist definitions of 'science' which emphasize the hypothesis-testing approach. In practice, the typical data science project life-cycle resembles more of an engineering view imposed due to constraints of resources (budget, data and skills availability) and time-to-market considerations.

The CRISP-DM model (CRoss Industry Standard Process for Data Mining) has traditionally defined six steps in the data mining life-cycle. Data science is similar to data mining in several aspects, hence there's some similarity with these steps.

The CRISP model steps are: 1. Business Understanding 2. Data Understanding 3. Data Preparation 4. Modeling 5. Evaluation and 6. Deployment

Read Also:
Get the Most from your Voice of Customer Data

Given a certain level of maturity in big data and data science expertise within the organization, it is reasonable to assume availability of a library of assets related to data science implementations. Key among these are: 1. Library of business use-cases for big data/ data science applications 2. Data requirements - business use case mapping matrix 3. Minimum data quality requirements (test cases to ensure minimum level of data quality to ensure feasibility)

In most organizations, data science is a fledgling discipline, hence data scientists (except those from actuarial background) are likely to have limited business domain expertise - therefore they need to be paired with business people and those with expertise in understanding the data. This helps data scientists gain or work together on steps 1 and 2 of the CRISM-DM model - i.e. business understanding and data understanding.

The typical data science project then becomes an engineering exercise in terms of a defined framework of steps or phases and exit criteria, which allow making informed decisions on whether to continue projects based on pre-defined criteria, to optimize resource utilization and maximize benefits from the data science project. This also prevents the project from degrading into money-pits due to pursuing nonviable hypotheses and ideas.

Read Also:
Why Data-Based Algorithms Are Key To Business Survival

The data science life-cycle thus looks somewhat like: 1. Data acquisition 2. Data preparation 3. Hypothesis and modeling 4. Evaluation and Interpretation 5. Deployment 6. Operations 7. Optimization

Data Acquisition - may involve acquiring data from both internal and external sources, including social media or web scraping.

 



Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
IoT And Big Data: Success Comes Down To A Solid Strategy

AI Paris

6
Jun
2017
AI Paris

20% off with code AIP17-7WDATA-20

Read Also:
Why Data-Based Algorithms Are Key To Business Survival

Chief Data Officer Summit San Francisco

7
Jun
2017
Chief Data Officer Summit San Francisco

$200 off with code DATA200

Read Also:
Turning Big Data from Cost to Revenue

Customer Analytics Innovation Summit Chicago

7
Jun
2017
Customer Analytics Innovation Summit Chicago

$200 off with code DATA200

Read Also:
Training an AI Doctor

Big Data and Analytics Marketing Summit London

12
Jun
2017
Big Data and Analytics Marketing Summit London

$200 off with code DATA200

Read Also:
Turning Big Data from Cost to Revenue
Read Also:
Swiss data analytics company Sophia Genetics could be Switzerland’s next unicorn

Leave a Reply

Your email address will not be published. Required fields are marked *