Evaluating Data Science Projects

Evaluating Data Science Projects

It’s not necessary to understand the inner workings of a Machine Learning project, but you should understand whether the right things have been measured and whether the results are suited to the business problem. You need to know whether to believe what data scientists are telling you.

I’ve written two blog posts on evaluation—the broccoli of Machine Learning. There are actually two closely related concerns under the rubric of evaluation:

Both types are important not only to data scientists but also to managers and executives, who must evaluate project proposals and results. To managers I would say: It’s not necessary to understand the inner workings of a machine learning project, but you should understand whether the right things have been measured and whether the results are suited to the business problem. You need to know whether to believe what data scientists are telling you.

To this end, here I’ll evaluate a machine learning project report. I found this work described as a customer success story on a popular machine learning blog. The write-up was posted in early 2017, along with a related video presenting the results. Some aspects are confusing, as you’ll see, but I haven’t sought clarification from the authors because I wanted to critique it just as reported. This makes for a realistic case study: you often have to evaluate projects with missing or confusing details.

As you’ll see, we’ll uncover some common application mistakes that even professional data scientists can make.  

  The problem is presented as this: A large insurance company wants to predict especially large insurance claims. Specifically, they divide their population into drivers who report an accident (7–10%), drivers who have no accidents (90–93%), and so-called large-loss drivers who report an accident involving damages of $10,000 or more (about 1% of their population). It is only the last group involved in large, expensive claims that they want to detect. They are facing a two-class problem, whose classes they call Large Loss and Non-Large Loss.

Readers acquainted with my prior posts may recall I’ve talked about how common unbalanced classes are in real-world machine learning problems. Indeed, here we see a 99:1 skew where the positive (Large-Loss) instances are outnumbered by the uninteresting negative instances by about two orders of magnitude. (By the way, this would be considered very skewed by ML research standards, though it’s on the lighter side by real-world standards.) Because of this skew, we have to be careful in evaluation.  

  Their approach was fairly straightforward. They had a historical data sample of previous drivers’ records on which to train and test. They represented each driver’s record using 70 features, encompassing both categorical and numerical features, although only a few of these are shown.

They state that their client had previously used a Random Forest to solve this problem. A Random Forest is a well known and popular technique that builds an ensemble of decision trees to classify instances. They hope to do better using a deep learning neural network. Their network design looks like this:

The model is a fully connected neural network with three hidden layers, with a ReLU as the activation function. They state that data from Google Compute Engine was used to train the model (implemented in TensorFlow), and Cloud Machine Learning Engine’s HyperTune feature was used to tune hyperparameters.

I have no reason to doubt their representation choices or network design, but one thing looks odd. Their output is two ReLU (rectifier) units, each emitting the network’s accuracy (technically: recall) on that class. I would’ve chosen a single Softmax unit representing the probability of Large Loss driver, from which I could get a ROC or Precision-Recall curve. I could then threshold the output to get any achievable performance on the curve. (I explain the advantages of scoring over hard classification in this post.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

Why Good Data Scientists are Worth the Big Bucks

20 May, 2017

Data scientist is the ‘Sexiest Job of the 21 Century’, so say Thomas Davenport and DJ Patil in their seminal …

Read more

Underwater IoT: The Promise of Data Analytics in Aquaculture

19 Feb, 2022

If you follow IoT, you know how much it’s changing industries. Devices collect data. Analytics platforms turn that data into …

Read more

What devops needs to know about data governance

11 Sep, 2022

Data governance is an umbrella term encompassing several different disciplines and practices, and the priorities often depend on who is …

Read more

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.