Heart Alert: A heart disease prediction system using machine learning approach and optimization techniques

Justin Allen P. Denopol, Ma. Sheila A. Magboo, Vincent Peter C. Magboo

Abstract


Background:  Cardiovascular diseases belong to the top three leading causes of mortality in the Philippines with 17.8 % of the total deaths. Lifestyle-related habits such as alcohol consumption, smoking, poor diet and nutrition, high sedentary behavior, overweight, and obesity have been increasingly implicated in the high rates of heart disease among Filipinos leading to a significant burden to the country's healthcare system. The objective of this study was to predict the presence of heart disease using various machine learning algorithms (support vector machine, naïve Bayes, random forest, logistic regression, decision tree, and adaptive boosting) evaluated on an anonymized publicly available cardiovascular disease dataset.

Methodology:  Various machine learning algorithms were applied on an anonymized publicly available
cardiovascular dataset from a machine learning data repository (IEEE Dataport). A web-based application
system named Heart Alert was developed based on the best machine learning model that would predict the risk of developing heart disease. An assessment of the effects of different optimization techniques as to the imputation methods (mean, median, mode, and multiple imputation by chained equations) and as to the feature selection method (recursive feature elimination) on the classification performance of the machine learning algorithms was made. All simulation experiments were implemented via Python 3.8 and its machine learning libraries (Scikit-learn, Keras, Tensorflow, Pandas, Matplotlib, Seaborn, NumPy).

Results:  The support vector machine without imputation and feature selection obtained the highest
performance metrics (90.2% accuracy, 87.7% sensitivity, 93.6% specificity, 94.9% precision, 91.2% F1-score and an area under the receiver operating characteristic curve of 0.902 ) and was used to implement the heart disease prediction system (Heart Alert). Following very closely were random forest with mean or median imputation and logistic regression with mode imputation, all having no feature selection which also performed well.

Conclusion: The performance of the best four machine learning models suggests that for this dataset,
imputation technique for missing values may or may not be done. Likewise, recursive feature elimination for feature selection may not apply as all variables seem to be important in heart disease prediction. An early accurate diagnosis leading to prompt intervention efforts is very crucial as it improves the patient's quality of life and diminishes the risk of developing cardiac events.


Keywords


heart disease prediction; machine learning; imputation techniques; feature selection; support vector machine.

Full Text:

PDF

Refbacks

  • There are currently no refbacks.


Print ISSN: 2704-3517; Online ISSN: 2783-042X