The data science project lifecycle

The data science project lifecycle

The data science project lifecycle

How does the typical data science project life-cycle look like?

This post looks at practical aspects of implementing data science projects. It also assumes a certain level of maturity in big data (more on big data maturity models in the next post) and data science management within the organization. Therefore the life cycle presented here differs, sometimes significantly from purist definitions of 'science' which emphasize the hypothesis-testing approach. In practice, the typical data science project life-cycle resembles more of an engineering view imposed due to constraints of resources (budget, data and skills availability) and time-to-market considerations.

The CRISP-DM model (CRoss Industry Standard Process for Data Mining) has traditionally defined six steps in the data mining life-cycle. Data science is similar to data mining in several aspects, hence there's some similarity with these steps.

The CRISP model steps are: 1. Business Understanding 2. Data Understanding 3. Data Preparation 4. Modeling 5. Evaluation and 6. Deployment

Read Also:
Why AI could be the key to turning the tide in the fight against cybercrime

Given a certain level of maturity in big data and data science expertise within the organization, it is reasonable to assume availability of a library of assets related to data science implementations. Key among these are: 1. Library of business use-cases for big data/ data science applications 2. Data requirements - business use case mapping matrix 3. Minimum data quality requirements (test cases to ensure minimum level of data quality to ensure feasibility)

In most organizations, data science is a fledgling discipline, hence data scientists (except those from actuarial background) are likely to have limited business domain expertise - therefore they need to be paired with business people and those with expertise in understanding the data. This helps data scientists gain or work together on steps 1 and 2 of the CRISM-DM model - i.e. business understanding and data understanding.

The typical data science project then becomes an engineering exercise in terms of a defined framework of steps or phases and exit criteria, which allow making informed decisions on whether to continue projects based on pre-defined criteria, to optimize resource utilization and maximize benefits from the data science project. This also prevents the project from degrading into money-pits due to pursuing nonviable hypotheses and ideas.

Read Also:
Exploring Drivers of Innovation Change

The data science life-cycle thus looks somewhat like: 1. Data acquisition 2. Data preparation 3. Hypothesis and modeling 4. Evaluation and Interpretation 5. Deployment 6. Operations 7. Optimization

Data Acquisition - may involve acquiring data from both internal and external sources, including social media or web scraping.

 



Enterprise Data World 2017

2
Apr
2017
Enterprise Data World 2017

$200 off with code 7WDATA

Read Also:
Google's Deep Mind Gives AI a Memory Boost That Lets It Navigate London's Underground

Data Visualisation Summit San Francisco

19
Apr
2017
Data Visualisation Summit San Francisco

$200 off with code DATA200

Read Also:
Data Warehouse Disruptions 2016: Gartner Magic Quadrant

Chief Analytics Officer Europe

25
Apr
2017
Chief Analytics Officer Europe

15% off with code 7WDCAO17

Read Also:
How Self-Service Analytics Is Saving Business Intelligence

Chief Analytics Officer Spring 2017

2
May
2017
Chief Analytics Officer Spring 2017

15% off with code MP15

Read Also:
How Self-Service Analytics Is Saving Business Intelligence

Big Data and Analytics for Healthcare Philadelphia

17
May
2017
Big Data and Analytics for Healthcare Philadelphia

$200 off with code DATA200

Read Also:
Quantum Physics And The Big Data Question
Read Also:
Hot new big data analytics jobs you need to know

Leave a Reply

Your email address will not be published. Required fields are marked *