Binary Classification Model for Credit Card Default Using Python Take 1

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

Dataset Used: Default of Credit Card Clients Data Set

Dataset ML Model: Binary classification with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients

One potential source of performance benchmark: https://www.kaggle.com/uciml/default-of-credit-card-clients-dataset

INTRODUCTION: This dataset contains information on default payments, demographic factors, credit data, history of payment, and bill statements of credit card clients in Taiwan from April 2005 to September 2005.

CONCLUSION: The baseline performance of the ten algorithms achieved an average accuracy of 74.38%. The group of ensemble algorithms (Bagged CART, Random Forest, Extra Trees, AdaBoost, and Stochastic Gradient Boosting) achieved the top accuracy scores after the first round of modeling. After a series of tuning trials, Stochastic Gradient Boosting turned in the top result using the training data. It achieved an average accuracy of 81.97%. Using the optimized tuning parameter available, the Stochastic Gradient Boosting algorithm processed the validation dataset with an accuracy of 82.91%, which was slightly better than the accuracy of the training data. For this round of modeling, the Stochastic Gradient Boosting ensemble algorithm yielded consistently top-notch training and validation results, which warrant the additional processing required by the algorithm.

The HTML formatted report can be found here on GitHub.