Simple Classification Model for Diabetes Prediction Using R

Dataset Used: Pima Indians Diabetes Database

Data Set ML Model: Classification with numerical attributes

INTRODUCTION: The Pima Indians Diabetes Dataset involves predicting the onset of diabetes within 5 years in Pima Indians given medical details. It is a binary (2-class) classification problem. There are 768 observations with 8 input variables and 1 output variable. Missing values are believed to be encoded with zero values.

CONCLUSION: The baseline performance of predicting the class variable achieved an average accuracy of 75.85%. The top accuracy result achieved via Logistic Regression was 77.73% after a series of tuning trials. The ensemble algorithms, in this case, did not yield a better result than the non-ensemble algorithms to justify the additional processing required.

The HTML formatted report can be found here on GitHub.