Regression Model for Wine Quality Using R

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a prediction model using various machine learning algorithms and to document the end-to-end steps using a template. The Wine Quality dataset can be approached as a regression situation where we are trying to predict the rating of the wine.

INTRODUCTION: The two datasets are related to red and white variants of the Portuguese “Vinho Verde” wine. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.). The goal is to model wine quality based on physicochemical tests.

CONCLUSION: The baseline performance of the seven algorithms achieved an average RMSE of 0.7119. Three algorithms (Support Vector Machine, Random Forest, and Stochastic Gradient Boosting) achieved the top RMSE scores after the first round of modeling. After a series of tuning trials, Random Forest turned in the top result using the training data. It achieved an average RMSE of 0.6088. Using the optimized tuning parameter available, the Random Forest algorithm processed the validation dataset with an RMSE of 0.6416, which was slightly worse than the RMSE of the training data. For this project, the Random Forest ensemble algorithm yielded consistently top-notch training and validation results, which warrant the additional processing required by the algorithm.

Dataset Used: Wine Quality Data Set

Dataset ML Model: Regression with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/wine+quality

One potential source of performance benchmarks: https://www.kaggle.com/uciml/red-wine-quality-cortez-et-al-2009

The HTML formatted report can be found here on GitHub.