Simple Classification Model for Text Messages with Python

Methodology Credit: Re-produced and adapted from a tutorial made available by Evgeny Volkov, SMS Spam Detection with Various Classifiers.

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

Data Set Description: https://www.kaggle.com/uciml/sms-spam-collection-dataset

Original Reference: http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/

Modeling Approach: binary classification

The SMS Spam Collection is a set of SMS tagged messages that have been collected for SMS Spam research. It contains one set of SMS messages in English of 5,574 messages, tagged according to being ham (legitimate) or spam.

We will be spot-checking a suite of linear and nonlinear machine learning algorithms and comparing the estimated accuracy of algorithms. For this project, we will evaluate 9 different algorithms:

Linear Algorithms: Logistic Regression (LR)

Nonlinear Algorithms: Decision Tree (DTC), Support Vector Machine (SVC), Multinomial Native Bayes (MNB) and k-Nearest Neighbors (KNC)

Ensemble Algorithms: Random Forest (RFC), AdaBoost (ABC), Bagging (BC), and ExtraTree (ETC)

The HTML formatted report can be found here on GitHub.