Regression Model for Wine Quality Using Python Take 2

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

Dataset Used: Wine Quality Data Set

Dataset ML Model: Regression with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/wine+quality

One potential source of performance benchmarks: https://www.kaggle.com/uciml/red-wine-quality-cortez-et-al-2009

INTRODUCTION: The two datasets are related to red and white variants of the Portuguese “Vinho Verde” wine. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.). The goal is to model wine quality based on physicochemical tests.

From the previous iteration, the baseline performance of the 11 algorithms achieved an average RMSE of 0.5094. The four ensemble algorithms (AdaBoost, Extra Trees, Random Forest, and Stochastic Gradient Boosting) achieved the top RMSE scores after the first round of modeling. After a series of tuning trials, Extra Trees turned in the top result using the training data. It achieved an average RMSE of 0.3453. After optimizing the tuning parameters, the Extra-Trees algorithm processed the validation dataset with an RMSE of 0.3089, which was even better than the accuracy of the training data.

For this iteration of the project, we will perform the modeling using only the data for the white wine. For the subsequent iterations, we will analyze the combined data from both types of wine.

CONCLUSION: The baseline performance of the 11 algorithms achieved an average RMSE of 0.6111. The four ensemble algorithms (AdaBoost, Extra Trees, Random Forest, and Stochastic Gradient Boosting) achieved the top RMSE scores after the first round of modeling. After a series of tuning trials, Extra Trees turned in the top result using the training data. It achieved an average RMSE of 0.3869. After optimizing the tuning parameters, the Extra-Trees algorithm processed the validation dataset with an RMSE of 0.3574, which was even better than the RMSE of the training data. For this project, the Extra-Trees ensemble algorithm yielded consistently top-notch training and validation results, which warrant the additional processing required by the algorithm.

In summary, modeling red wine prediction appears to be slightly more accurate than modeling the prediction for white wines.

The HTML formatted report can be found here on GitHub.