Teaching the Data Science Process

Teaching the Data Science Process

Teaching the Data Science Process

Understanding the process requires not only wide technical background in machine learning but also basic notions of businesses administration; here I will share my experience on teaching the data science process.

Curricula for teaching machine learning have existed for decades and even more recent technical subjects (deep learning or big data architectures) have almost standard course outlines and linearized storylines. On the other hand, teaching support for the data science processhas been elusive, even though the outlines of the process have been around since the 90s. Understanding the process requires not only wide technical background in machine learning but also basic notions of businesses administration. I have elaborated on the organizational difficulties of data science transformation stemming from these complexities in a previous essay; here I will share my experience on teaching the data science process.

The data science ecosystem. Data scientists “B” is in key position in formalizing the business problem and designing the data science workflow.

Read Also:
Best Programming Languages for Big Data

  I recently had the opportunity to try some experimental pedagogical techniques on about hundred top tier engineering students from Ecole Polytechnique. The central concept of the course was the data science workflow.

None of these two can be taught using linearized narratives in slide-based lectures. I built the course around our RAMP concept using our platform. To learn workflow optimization, students participated in five RAMPs, designed to challenge them on different scientific workflows and on different data science problems. To learn workflow design, I covered a couple of data-driven business cases, gave students a linear guide with specific questions to answer to, and asked them to build business cases and data science workflows in group projects. I used the RAMP starting kits as samples: limiting the infinite design space helped students to structure the projects.

  The RAMP was originally designed for a collaborative prototyping tool that makes efficient use of the time of data scientists in solving the data analytics segment of domain science or business problems. We then realized very soon that it is equally valuable for training novice data scientists. The main design feature we needed to change was complete openness. To be able to grade students based on individual performance, we needed to close the leaderboard. In the closed phase students see each other’s scores but not each other’s codes. We grade them using a capped linear function of their score. This typically 1–2 week long closed phase is followed by a “classical” open RAMP in which we grade students based on their activities and their ability of generating diversity and improving their own closed phase score.

Read Also:
Big Data Mistakes That Most Companies Make

The collective performance of the students was nothing short of spectacular. In all five RAMPs they beat not only the baseline but also the single day hackaton scores that we organized to test the workflows, with typically 30–50 top data scientists and domain scientists participating.

Score vs submission timestamp of the first classroom RAMP. Blue and red circles represent submissions in the closed and open phases, respectively.



Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
Data. No Big Thing (Yet…)

AI Paris

6
Jun
2017
AI Paris

20% off with code AIP17-7WDATA-20

Read Also:
Data for all: SMES, scalability and the big Big Data tools

Chief Data Officer Summit San Francisco

7
Jun
2017
Chief Data Officer Summit San Francisco

$200 off with code DATA200

Read Also:
Data. No Big Thing (Yet…)

Customer Analytics Innovation Summit Chicago

7
Jun
2017
Customer Analytics Innovation Summit Chicago

$200 off with code DATA200

Read Also:
Business leaders urge new thinking in age of artificial intelligence
Read Also:
Mini-glossary: Big data terms you should know

Big Data and Analytics Marketing Summit London

12
Jun
2017
Big Data and Analytics Marketing Summit London

$200 off with code DATA200

Read Also:
How to Build a Data Strategy Pt. II

Leave a Reply

Your email address will not be published. Required fields are marked *