Ensemble Classification Model for the Sonar Dataset with R

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

For more information on this case study project, please consult Dr. Brownlee’s blog post at https://machinelearningmastery.com/standard-machine-learning-datasets/.

Dataset Used: Connectionist Bench (Sonar, Mines vs. Rocks) Data Set

ML Model: Classification, numeric inputs

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Connectionist+Bench+%28Sonar%2C+Mines+vs.+Rocks%29

The Sonar Dataset involves the prediction of whether or not an object is a mine or a rock given the strength of sonar returns at different angles. It is a binary (2-class) classification problem.

CONCLUSION: The baseline performance of predicting the most prevalent class achieved an accuracy of approximately 76.0%. Top results achieved via SVM was approximately 85.06% after a series of tuning. The RandomForest ensemble algorithm, also after tuning, yielded an accuracy of 85.09%. The very slight improvement between RF and SVM was too small to justify the additional processing and tuning required by the ensemble algorithm.

The purpose of this project is to analyze a dataset using various machine learning algorithms and to document the steps using a template. The project aims to touch on the following areas:

  • Document a regression predictive modeling problem end-to-end.
  • Explore data transformation options for improving model performance
  • Explore algorithm tuning techniques for improving model performance
  • Explore using and tuning ensemble methods for improving model performance

The HTML formatted report can be found here on GitHub.