Simple Classification Model for Bank Marketing Using Python

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

Dataset Used: Bank Marketing Dataset

Dataset ML Model: Binary classification with numerical and categorical attributes

Dataset Reference: http://archive.ics.uci.edu/ml/datasets/bank+marketing

One source of potential performance benchmarks: https://www.kaggle.com/rouseguy/bankbalanced

INTRODUCTION: The Bank Marketing dataset involves predicting the whether the bank clients will subscribe (yes/no) a term deposit (target variable). It is a binary (2-class) classification problem. There are over 45,000 observations with 16 input variables and 1 output variable. There are no missing values in the dataset.

CONCLUSION: The baseline performance of the 11 algorithms achieved an average accuracy of 89.13%. Three algorithms (Stochastic Gradient Boosting, Random Forest, and AdaBoost) achieved the top accuracy and Kappa scores. The top result achieved using the training data was from Stochastic Gradient Boosting. It achieved an average accuracy of 91.00% after a series of tuning trials, and its accuracy on processing the validation dataset was 90.58%. For this project, the Stochastic Gradient Boosting ensemble algorithm yielded consistently top-notch training and validation results, which warrant the additional processing required by the algorithm.

The HTML formatted report can be found here on GitHub.