Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
SUMMARY: The purpose of this project is to construct a prediction model using various machine learning algorithms and to document the end-to-end steps using a template. The Human Activities with Smartphone Dataset is a multi-class classification situation where we are trying to predict one of the six possible outcomes.
INTRODUCTION: Researchers collected the datasets from experiments that consist of a group of 30 volunteers with each person performed six activities wearing a smartphone on the waist. With its embedded accelerometer and gyroscope, the research captured measurement for the activities of WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING. The dataset has been randomly partitioned into two sets, where 70% of the volunteers were selected for generating the training data and 30% of the test data.
In iteration Take1, the script focuses on evaluating various machine learning algorithms and identify the algorithm that produces the best accuracy metric. Iteration Take1 established a baseline performance regarding accuracy and processing time. For this iteration, we will examine the feasibility of using dimensionality reduction techniques to reduce the processing time while still maintaining an adequate level of prediction accuracy. The first technique we will explore is to eliminate collinear attributes based on a threshold of 85%.
CONCLUSION: From the previous iteration Take 1, the baseline performance of the ten algorithms achieved an average accuracy of 84.68%. Three algorithms (Linear Discriminant Analysis, Support Vector Machine, and Stochastic Gradient Boosting) achieved the top three accuracy scores after the first round of modeling. After a series of tuning trials, Linear Discriminant Analysis turned in the top result using the training data. It achieved an average accuracy of 95.43%. Using the optimized tuning parameter available, the Linear Discriminant Analysis algorithm processed the validation dataset with an accuracy of 96.23%, which was even better than the accuracy from the training data.
From the current iteration, the baseline performance of the ten algorithms achieved an average accuracy of 83.54%. Three algorithms (Linear Discriminant Analysis, Support Vector Machine, and Stochastic Gradient Boosting) achieved the top three accuracy scores after the first round of modeling. After a series of tuning trials, Support Vector Machine turned in the top result using the training data. It achieved an average accuracy of 93.34%. Using the optimized tuning parameter available, the Support Vector Machine algorithm processed the validation dataset with an accuracy of 93.82%, which was slightly better than the accuracy from the training data.
From the model-building activities, the number of attributes went from 561 down to 172 after eliminating 389 variables that are at least 85% collinear. The processing time went from 8 hours 16 minutes in iteration Take1 down to 2 hours and 7 minutes in iteration Take2. That was a reduction in model training and processing time of 74%.
In conclusion, the reduction in the number of attributes used still achieved an acceptable level of accuracy. Furthermore, the Support Vector Machine algorithm achieved the top-notch training and validation results. For the project, Support Vector Machine should be considered for further modeling or production use.
Dataset Used: Human Activity Recognition Using Smartphone Data Set
Dataset ML Model: Multi-class classification with numerical attributes
One potential source of performance benchmarks: https://www.kaggle.com/uciml/human-activity-recognition-with-smartphones
The HTML formatted report can be found here on GitHub.