MEDPOLY: A Proposed Approach for Cardiovascular Disease Prediction Using Various Ensemble Models

Sreekumari S, Rajni Bhalla and Gursharan Singh

doi:10.7492/qywdpz98

Authors

Sreekumari S, Rajni Bhalla and Gursharan Singh Author

DOI:

https://doi.org/10.7492/qywdpz98

Abstract

An ensemble model is a combination of different numbers of models for making predictions. Each model is trained separately and all models participate to solve the same issue. The output from all models is combined to make predictions. Research shows that the ensemble model provides better results compared to individual models. In the 1990’s, four popular machine learning models were introduced bagging, boosting, stacking, and voting to improve the performance of the ensemble model. The two-tailed test indicates ensemble model is best as compared to the single method. In this study, two ensemble models have been proposed that are applied to different datasets. The first model is a combination of decision tree, neural network, and logistic regression (DNL) that is applied to 4 different datasets. The second model’s name MEDPOLY is proposed as a combination of 24 different classifiers and tested using bagging, boosting, stacking, and voting classifiers. DNL ensemble model using stacking performs better as compared with bagging and provides an accuracy of 98% in predicting cardiovascular disease. MEDPOLY ensemble model that consists of 24 different classifiers is compared with DNL also. The result reveals that the DNL outperforms the MEDPOLY framework. Analysis indicates that DNL can handle linear relationships and complex patterns and is also able to handle situations of overfitting. MEDPOLY when applied to a dataset that is not large or diverse enough, can lead to over-fitting. A combination of a smaller set of classifiers provides better and interpretable results.