6a010534b1db25970b01bb08d67a8c970d-500wi

Predicting Wine Quality with Azure ML and R

Predicting Wine Quality with Azure ML and R

In machine learning, the problem of classification entails correctly identifying to which class or group a new observation belongs, by learning from observations whose classes are already known. In what follows, I will build a classification experiment in Azure ML Studio to predict wine quality based on physicochemical data. Several classification algorithms will be applied on the data set and the performance of these algorithms will be compared. I will also present a tutorial on how to do similar exercise using MRS (Microsoft R Server, formerly Revolution R Enterprise). I will use wine quality data set from the UCI Machine Learning Repository. The dataset contains quality ratings (labels) for a 1599 red wine samples. The features are the wines’ physical and chemical properties (11 predictors). We want to use these properties to predict the quality of the wine. The experiment is shown below and can be found in the Cortana Intelligence Gallery.

There are several classification algorithms available in Azure ML viz. Multiclass Decision Forest, Multiclass Decision Jungle, Multiclass Logistic regression, Multiclass Neural Network and One-vs-All Multiclass which creates a multiclass classification model from an ensemble of binary classification models. Each of these algorithms have their advantages. The Decision Forest consists of an ensemble of randomly trained decision trees. The ensemble models in general provide better coverage and accuracy than single decision trees. Building multiple random decision trees and training them independently improves generalization and resilience to noisy data. Decision Jungles are a recent extension to decision forests. They require less memory and have considerably improved generalization. Given sufficient number of hidden layers and nodes, neural networks can approximate any function. However, they can be computationally expensive due to a number of hyperparameters. Multiclass Logistic Regression is an extension of Logistic Regression and predicts the probability of an outcome. The best practice for finding which algorithm will perform best is to try them!

Read Also:
Big Data investment is up, for how long?

The original data had several labels with some of the labels having very few instances.;



Predictive Analytics Innovation summit San Diego
22 Feb

$200 off with code DATA200

Read Also:
Big Data investment is up, for how long?
Read Also:
Keeping a clear mind about the potential downsides of AI
Big Data Paris 2017
6 Mar
Big Data Paris 2017

15% off with code BDP17-7WDATA

Read Also:
Building a Smart Data Lake While Avoiding the ‘Dump’
Read Also:
Big Data investment is up, for how long?

Leave a Reply

Your email address will not be published. Required fields are marked *