6a010534b1db25970b01bb08d67a8c970d-500wi

Predicting Wine Quality with Azure ML and R

Predicting Wine Quality with Azure ML and R

In machine learning, the problem of classification entails correctly identifying to which class or group a new observation belongs, by learning from observations whose classes are already known. In what follows, I will build a classification experiment in Azure ML Studio to predict wine quality based on physicochemical data. Several classification algorithms will be applied on the data set and the performance of these algorithms will be compared. I will also present a tutorial on how to do similar exercise using MRS (Microsoft R Server, formerly Revolution R Enterprise). I will use wine quality data set from the UCI Machine Learning Repository. The dataset contains quality ratings (labels) for a 1599 red wine samples. The features are the wines’ physical and chemical properties (11 predictors). We want to use these properties to predict the quality of the wine. The experiment is shown below and can be found in the Cortana Intelligence Gallery.

There are several classification algorithms available in Azure ML viz. Multiclass Decision Forest, Multiclass Decision Jungle, Multiclass Logistic regression, Multiclass Neural Network and One-vs-All Multiclass which creates a multiclass classification model from an ensemble of binary classification models. Each of these algorithms have their advantages. The Decision Forest consists of an ensemble of randomly trained decision trees. The ensemble models in general provide better coverage and accuracy than single decision trees. Building multiple random decision trees and training them independently improves generalization and resilience to noisy data. Decision Jungles are a recent extension to decision forests. They require less memory and have considerably improved generalization. Given sufficient number of hidden layers and nodes, neural networks can approximate any function. However, they can be computationally expensive due to a number of hyperparameters. Multiclass Logistic Regression is an extension of Logistic Regression and predicts the probability of an outcome. The best practice for finding which algorithm will perform best is to try them!

Read Also:
IoT can help improve rail safety

The original data had several labels with some of the labels having very few instances.;



Chief Analytics Officer Europe

25
Apr
2017
Chief Analytics Officer Europe

15% off with code 7WDCAO17

Read Also:
IoT can help improve rail safety

Chief Analytics Officer Spring 2017

2
May
2017
Chief Analytics Officer Spring 2017

15% off with code MP15

Read Also:
The Evolving Chief Data Officer Role: Driving Revenue and Profitability

Big Data and Analytics for Healthcare Philadelphia

17
May
2017
Big Data and Analytics for Healthcare Philadelphia

$200 off with code DATA200

Read Also:
U.S. Chief Data Officer: 'Time is Now' For Technologists to Jump into Public Service

SMX London

23
May
2017
SMX London

10% off with code 7WDATASMX

Read Also:
Privacy protections for wearable devices are weak, study says

Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
Smart City Projects Have CIOs on the Hunt for New Business Models

Leave a Reply

Your email address will not be published. Required fields are marked *