Binary Classification Model for Edible Mushrooms Using R

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

Dataset Used: Mushroom Data Set

Dataset ML Model: Binary classification with categorical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Mushroom

One potential source of performance benchmarks: https://www.kaggle.com/uciml/mushroom-classification

INTRODUCTION: This data set includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Family (pp. 500-525). Each species is identified as definitely edible or definitely poisonous. The Guide, The Audubon Society Field Guide to North American Mushrooms (1981). G. H. Lincoff (Pres.), New York: Alfred A. Knopf, clearly states that there is no simple rule for determining the edibility of a mushroom.

CONCLUSION: The baseline performance of predicting the class variable achieved an average accuracy of 98.65%, which was very encouraging. Four algorithms (Logistic Regression, Random Forest, AdaBoost, and Stochastic Gradient Boosting) yielded the top accuracy result of 100% using the training dataset alone. The training dataset contained 65% of the records from the original dataset (or 5,282 records), whereas the validation dataset had the remainder 35% or 2,842 records.

After applying the validation dataset to the four top training algorithms, all four algorithms continued to perform and achieved the accuracy of 100% with the validation data. Considering the Logistic Regression models required the least amount of training time, the recommendation is to consider using the Logistic Regression model for all future mushroom predictions.

The HTML formatted report can be found here on GitHub.