An Integrated Machine Learning Framework for Effective Prediction of Cardiovascular Diseases

An Integrated Machine Learning Framework for Effective Prediction of Cardiovascular Diseases

Abstract:

Cardiovascular diseases are considered as the most life-threatening syndromes with the highest mortality rate globally. Over a period of time, they have become very common and are now overstretching the healthcare systems of countries. The major factors of cardiovascular diseases are high blood pressure, family history, stress, age, gender, cholesterol, Body Mass Index (BMI), and unhealthy lifestyle. Based on these factors, researchers have proposed various approaches for early diagnosis. However, the accuracy of proposed techniques and approaches needs certain improvements due to the inherent criticality and life threatening risks of cardiovascular diseases. In this article, a MaLCaDD (Machine Learning based Cardiovascular Disease Diagnosis) framework is proposed for the effective prediction of cardiovascular diseases with high precision. Particularly, the framework first deals with the missing values (via mean replacement technique) and data imbalance (via Synthetic Minority Over-sampling Technique - SMOTE). Subsequently, Feature Importance technique is utilized for feature selection. Finally, an ensemble of Logistic Regression and K-Nearest Neighbor (KNN) classifiers is proposed for prediction with higher accuracy. The validation of framework is performed through three benchmark datasets (i.e. Framingham, Heart Disease and Cleveland) and the accuracies of 99.1%, 98.0% and 95.5 % are achieved respectively. Finally, the comparative analysis proves that MaLCaDD predictions are more accurate (with reduced set of features) as compared to the existing state-of-the-art approaches. Therefore, MaLCaDD is highly reliable and can be applied in real environment for the early diagnosis of cardiovascular diseases.