Regression Model for Wine Quality Using Python Take 1

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

Dataset Used: Wine Quality Data Set

Dataset ML Model: Regression with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/wine+quality

One potential source of performance benchmarks: https://www.kaggle.com/uciml/red-wine-quality-cortez-et-al-2009

INTRODUCTION: The two datasets are related to red and white variants of the Portuguese “Vinho Verde” wine. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.). The goal is to model wine quality based on physicochemical tests.

For this iteration of the project, we will perform the modeling using only the data from the red wine. For the subsequent iterations, we will analyze the white wine data and the combined data from both types of wine.

CONCLUSION: The baseline performance of the 11 algorithms achieved an average RMSE of 0.5094. The four ensemble algorithms (AdaBoost, Extra Trees, Random Forest, and Stochastic Gradient Boosting) achieved the top RMSE scores after the first round of modeling. After a series of tuning trials, Extra Trees turned in the top result using the training data. It achieved an average RMSE of 0.3453. Using the optimized tuning parameter available, the Extra-Trees algorithm processed the validation dataset with an RMSE of 0.3089, which was even better than the accuracy of the training data. For this project, the Extra-Trees ensemble algorithm yielded consistently top-notch training and validation results, which warrant the additional processing required by the algorithm.

The HTML formatted report can be found here on Github.