Benchmarking Predictive Models

Benchmarking Predictive Models

It’s been said that debugging is harder than programming. If we, as data scientists, are developing models (“programming”) at the limits of our understanding, then we’re probably not smart enough to validate those models (“debug”) effectively.

There’s a bewildering amount of predictive modeling techniques and software libraries available to us building models. As a result, we’re likely to end up with multiple models in the course of attempting to solving a single problem. It’s important to understand the performance of these models, and provide some measure of confidence regarding which of them will achieve the business objectives of the modeling exercise. That’s where Benchmarking comes in.

Benchmarks will always be incomplete and partial, but if we are careful, they will be directionally correct: Correct enough that we can make the right choices, and in so doing, make our organizations smarter, more efficient, and more driven by insights.

Benchmarking machine learning models specifically is a challenging endeavor. Even simple changes in usage, software libraries, hyperparameter tuning or compute environment can cause drastic changes in behavior. In this article we will outline the best practices for comparing machine learning models, executing benchmarks, and ensuring that experiments are reproducible.

While it is important to benchmark the predictive performance of a machine learning model, there are also important operational constraints that require consideration. These include the training and runtime performance characteristics of a model. That is, how long it takes to train a model, how long it takes to score new data, and the compute resources required to accomplish both of these.

Benchmarking operational characteristics may often be as important as evaluating the predictive characteristics of a model. For example, consider the problem of real-time bidding on an ad-buying network. The total latency budget for a bid, covering network access, database queries and prediction, is 100ms. The 80ms spent on calculating a prediction in a deep learning model may mean that the model does not meet the business requirements. A logistic regression model, which returns a result in 5ms, may be more suitable, despite the deep learning model’s increased predictive power. A benchmark that only focuses on the predictive performance of a model would ignore this important aspect of putting a model into production.

Benchmarking requires that experiments be comparable, measurable, and reproducible. With so many potential combinations of data, modeling techniques, software, and hyperparameters, how a data scientist approaches the experimental environment is important.

Containers offer a framework for executing multiple runs of a modeling or scoring procedure, all with exactly the same experimental setup. This allows the data scientist to use aggregation and averaging to develop a more robust statistic on a desired attribute. Aspects of the experiment can also be incrementally altered in a controlled manner. With the right compute hardware, multiple simultaneous experiments can be run, significantly reducing the time to iterate. Beyond evaluating predictive power, many container technologies provide tools for recording the operational characteristics of training a model and running predictions. These include memory used, storage IO performance, and CPU utilization.

Often times, machine learning models will behave differently given variations in the environment. For example, models that rely on matrix calculations will perform differently when the number of available CPU cores, GPUs, or available system memory is altered. So too if underlying BLAS libraries, such as OpenBLAS, Atlas, or the Intel MKL, are interchanged.

Containers allow the data scientist to test multiple benchmark configurations, all with different assumptions baked into the compute environment, and to do so in a reproducible manner.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

Big Data Analytics Use Cases

20 Mar, 2017

Are you data-flooded, data-driven, data informed? Are you outcome oriented, insight driven or hindsight driven? Are you a firm where executives …

Read more

Embracing the benefits of a multi-cloud strategy

27 Aug, 2019

Almost every business today has Cloud as part of their IT- directly or indirectly, through adoption of Microsoft, Slack, Evernote …

Read more

For Companies, Data Analytics is a Pain; But Why?

3 Jul, 2017

Businesses across the globe are facing the brunt, one of huge data influx and second of increasing data complexity and …

Read more

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.