6a010534b1db25970b01bb08d67a8c970d-500wi

Predicting Wine Quality with Azure ML and R

Predicting Wine Quality with Azure ML and R

In machine learning, the problem of classification entails correctly identifying to which class or group a new observation belongs, by learning from observations whose classes are already known. In what follows, I will build a classification experiment in Azure ML Studio to predict wine quality based on physicochemical data. Several classification algorithms will be applied on the data set and the performance of these algorithms will be compared. I will also present a tutorial on how to do similar exercise using MRS (Microsoft R Server, formerly Revolution R Enterprise). I will use wine quality data set from the UCI Machine Learning Repository. The dataset contains quality ratings (labels) for a 1599 red wine samples. The features are the wines’ physical and chemical properties (11 predictors). We want to use these properties to predict the quality of the wine. The experiment is shown below and can be found in the Cortana Intelligence Gallery.

There are several classification algorithms available in Azure ML viz. Multiclass Decision Forest, Multiclass Decision Jungle, Multiclass Logistic regression, Multiclass Neural Network and One-vs-All Multiclass which creates a multiclass classification model from an ensemble of binary classification models. Each of these algorithms have their advantages. The Decision Forest consists of an ensemble of randomly trained decision trees. The ensemble models in general provide better coverage and accuracy than single decision trees. Building multiple random decision trees and training them independently improves generalization and resilience to noisy data. Decision Jungles are a recent extension to decision forests. They require less memory and have considerably improved generalization. Given sufficient number of hidden layers and nodes, neural networks can approximate any function. However, they can be computationally expensive due to a number of hyperparameters. Multiclass Logistic Regression is an extension of Logistic Regression and predicts the probability of an outcome. The best practice for finding which algorithm will perform best is to try them!

Read Also:
Artificial intelligence and the art of reader-driven publishing

The original data had several labels with some of the labels having very few instances.;



Data Innovation Summit 2017

30
Mar
2017
Data Innovation Summit 2017

30% off with code 7wData

Read Also:
Data science without statistics is possible, even desirable

Big Data Innovation Summit London

30
Mar
2017
Big Data Innovation Summit London

$200 off with code DATA200

Read Also:
Four ways Walmart uses analytics

Enterprise Data World 2017

2
Apr
2017
Enterprise Data World 2017

$200 off with code 7WDATA

Read Also:
Is Self Service Analytics The Key To True Data Democratization?

Data Visualisation Summit San Francisco

19
Apr
2017
Data Visualisation Summit San Francisco

$200 off with code DATA200

Read Also:
Reverse-engineering artificial intelligence

Chief Analytics Officer Europe

25
Apr
2017
Chief Analytics Officer Europe

15% off with code 7WDCAO17

Read Also:
How to Build a Data Science Team

Leave a Reply

Your email address will not be published. Required fields are marked *