Binary Classification Model for Credit Card Default Using R Take 2

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

Dataset Used: Default of Credit Card Clients Data Set

Dataset ML Model: Binary classification with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients

One potential source of performance benchmark: https://www.kaggle.com/uciml/default-of-credit-card-clients-dataset

INTRODUCTION: This dataset contains information on default payments, demographic factors, credit data, history of payment, and bill statements of credit card clients in Taiwan from April 2005 to September 2005.

Previously on the Take No.1 iteration, the baseline performance of the ten algorithms achieved an average accuracy of 81.05%. Three algorithms (Support Vector Machine, AdaBoost, and Stochastic Gradient Boosting) achieved the top accuracy scores after the first round of modeling. After a series of tuning trials, Stochastic Gradient Boosting turned in the top result using the training data. It achieved an average accuracy of 82.18%. Using the optimized tuning parameter available, Stochastic Gradient Boosting algorithm processed the validation dataset with an accuracy of 81.94%, which was just slightly lower than the accuracy of the training data.

For the Take No.2 iteration, we will perform the binning operation for the credit limit and age attributes and observe the effects on the models.

CONCLUSION: The baseline performance of the ten algorithms achieved an average accuracy of 81.07%. Three algorithms (Decision Trees, AdaBoost, and Stochastic Gradient Boosting) achieved the top accuracy scores after the first round of modeling. After a series of tuning trials, AdaBoost turned in the top result using the training data. It achieved an average accuracy of 82.22%. Using the optimized tuning parameter available, the AdaBoost algorithm processed the validation dataset with an accuracy of 82.06%, which was just slightly lower than the accuracy of the training data. For this round of modeling, the Stochastic Gradient Boosting ensemble algorithm yielded consistently top-notch training and validation results, which warrant the additional processing required by the algorithm.

For this round of modeling, converting the credit limit and age attributes from ordinal to categorical did not have a noticeable effect on the accuracy of the models.

The HTML formatted report can be found here on GitHub.