Binary-Class Classification Model for Seismic Bumps Take 1 Using R

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a prediction model using various machine learning algorithms and to document the end-to-end steps using a template. The Seismic Bumps Data Set is a binary-class classification situation where we are trying to predict one of the two possible outcomes.

INTRODUCTION: Mining activity has always been connected with the occurrence of dangers which are commonly called mining hazards. A special case of such a threat is a seismic hazard which frequently occurs in many underground mines. Seismic hazard is the hardest detectable and predictable of natural hazards, and it is comparable to an earthquake. The complexity of seismic processes and big disproportion between the number of low-energy seismic events and the number of high-energy phenomena causes the statistical techniques to be insufficient to predict seismic hazard. Therefore, it is essential to search for new opportunities for better hazard prediction, also using machine learning methods.

CONCLUSION: The baseline performance of the eight algorithms achieved an average accuracy of 93.11%. Three algorithms (Random Forest, Support Vector Machine, and Adaboost) achieved the top three accuracy scores after the first round of modeling. After a series of tuning trials, all three algorithms turned in the identical accuracy result of 93.42%, with an identical Kappa score of 0.0.

With an imbalanced dataset we have on-hand, we will need to look for another metric or another approach to evaluate the models.

Dataset Used: Seismic Bumps Data Set

Dataset ML Model: Binary classification with numerical and categorical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/seismic-bumps

The HTML formatted report can be found here on GitHub.