Regression Model for Bike Sharing Using R – Take 1

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

Dataset Used: Bike Sharing Dataset

Dataset ML Model: Regression with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset

For available performance benchmarks, please consult: https://www.kaggle.com/contactprad/bike-share-daily-data

INTRODUCTION: Using the data generated by a bike sharing system, this project attempts to predict the daily demand for bike sharing. For this iteration of the project, we attempt to use the data available for discovering a suitable machine learning algorithm that future predictions can use. We have kept the data transformation activities to a minimum and drop the several attributes that do not make sense to keep or simply will not help in training the model. Again, the goal of this iteration is to find a sufficiently accurate (low error) algorithm for the future prediction tasks.

CONCLUSION: The baseline performance of predicting the target variable achieved an average RMSE value of 1322. Three algorithms (Bagged CART, Random Forest, and Stochastic Gradient Boosting) achieved the lower RMSE and higher R-square values during the initial modeling round. After a series of tuning trials with these three algorithms, Stochastic Gradient Boosting produced the lowest RMSE value of 1213 and the highest R-square value at 0.6093 using the training data.

Stochastic Gradient Boosting also processed the validation dataset with an RMSE value of 1177 and an R-square value of 0.6329, which was better than the average training result. For this project, the Stochastic Gradient Boosting ensemble algorithm yielded consistently top-notch training and validation results, which warrant the additional processing required by the algorithm.

The HTML formatted report can be found here on GitHub.