Binary Classification Model for Credit Card Default Using Python Take 3

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

Dataset Used: Default of Credit Card Clients Data Set

Dataset ML Model: Binary classification with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients

One potential source of performance benchmark: https://www.kaggle.com/uciml/default-of-credit-card-clients-dataset

INTRODUCTION: This dataset contains information on default payments, demographic factors, credit data, history of payment, and bill statements of credit card clients in Taiwan from April 2005 to September 2005.

Previously on the Take No.1 iteration, the baseline performance of the ten algorithms achieved an average accuracy of 74.38%. The group of ensemble algorithms (Bagged CART, Random Forest, Extra Trees, AdaBoost, and Stochastic Gradient Boosting) achieved the top accuracy scores after the first round of modeling. After a series of tuning trials, Stochastic Gradient Boosting turned in the top result using the training data. It achieved an average accuracy of 81.97%. Using the optimized tuning parameter available, the Stochastic Gradient Boosting algorithm processed the validation dataset with an accuracy of 82.91%, which was slightly better than the accuracy of the training data.

For the Take No.2 iteration, we converted the Sex/Gender, Education, and Marital Status attributes into categorical variables and observed the effects on the models. After the conversion, the baseline performance of the ten algorithms achieved an average accuracy of 74.39%. The group of ensemble algorithms (Bagged CART, Random Forest, Extra Trees, AdaBoost, and Stochastic Gradient Boosting) achieved the top accuracy scores after the first round of modeling. After a series of tuning trials, Stochastic Gradient Boosting turned in the top result using the training data. It achieved an average accuracy of 81.96%. Using the optimized tuning parameter available, the Stochastic Gradient Boosting algorithm processed the validation dataset with an accuracy of 82.83%, which was slightly better than the accuracy of the training data.

For the Take No.3 iteration, we will perform the binning operation for the credit limit and age attributes and observe the effects on the models.

CONCLUSION: After the conversion, the baseline performance of the ten algorithms achieved an average accuracy of 74.34%. The group of ensemble algorithms (Bagged CART, Random Forest, Extra Trees, AdaBoost, and Stochastic Gradient Boosting) achieved the top accuracy scores after the first round of modeling. After a series of tuning trials, Stochastic Gradient Boosting turned in the top result using the training data. It achieved an average accuracy of 81.96%. Using the optimized tuning parameter available, the Stochastic Gradient Boosting algorithm processed the validation dataset with an accuracy of 82.75%, which was slightly better than the accuracy of the training data.

For this round of modeling, converting the credit limit and age attributes from ordinal to categorical did not have a noticeable effect on the accuracy of the models.

The HTML formatted report can be found here on GitHub.