TY - JOUR
T1 - Using machine learning on cardiorespiratory fitness data for predicting hypertension
T2 - The Henry Ford exercise testing (FIT) Project
AU - Sakr, Sherif
AU - Elshawi, Radwa
AU - Ahmed, Amjad
AU - Qureshi, Waqas T.
AU - Brawner, Clinton
AU - Keteyian, Steven
AU - Blaha, Michael J.
AU - Al-Mallah, Mouaz H.
N1 - Funding Information:
Funding was provided by King Abdullah International Medical Research Center. Funding grant number SP16/100 to SS. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Publisher Copyright:
© 2018 Sakr et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
PY - 2018/4
Y1 - 2018/4
N2 - This study evaluates and compares the performance of different machine learning techniques on predicting the individuals at risk of developing hypertension, and who are likely to benefit most from interventions, using the cardiorespiratory fitness data. The dataset of this study contains information of 23,095 patients who underwent clinician- referred exercise treadmill stress testing at Henry Ford Health Systems between 1991 and 2009 and had a complete 10-year follow-up. The variables of the dataset include information on vital signs, diagnosis and clinical laboratory measurements. Six machine learning techniques were investigated: LogitBoost (LB), Bayesian Network classifier (BN), Locally Weighted Naive Bayes (LWB), Artificial Neural Network (ANN), Support Vector Machine (SVM) and Random Tree Forest (RTF). Using different validation methods, the RTF model has shown the best performance (AUC = 0.93) and outperformed all other machine learning techniques examined in this study. The results have also shown that it is critical to carefully explore and evaluate the performance of the machine learning models using various model evaluation methods as the prediction accuracy can significantly differ.
AB - This study evaluates and compares the performance of different machine learning techniques on predicting the individuals at risk of developing hypertension, and who are likely to benefit most from interventions, using the cardiorespiratory fitness data. The dataset of this study contains information of 23,095 patients who underwent clinician- referred exercise treadmill stress testing at Henry Ford Health Systems between 1991 and 2009 and had a complete 10-year follow-up. The variables of the dataset include information on vital signs, diagnosis and clinical laboratory measurements. Six machine learning techniques were investigated: LogitBoost (LB), Bayesian Network classifier (BN), Locally Weighted Naive Bayes (LWB), Artificial Neural Network (ANN), Support Vector Machine (SVM) and Random Tree Forest (RTF). Using different validation methods, the RTF model has shown the best performance (AUC = 0.93) and outperformed all other machine learning techniques examined in this study. The results have also shown that it is critical to carefully explore and evaluate the performance of the machine learning models using various model evaluation methods as the prediction accuracy can significantly differ.
UR - http://www.scopus.com/inward/record.url?scp=85045648714&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85045648714&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0195344
DO - 10.1371/journal.pone.0195344
M3 - Article
C2 - 29668729
AN - SCOPUS:85045648714
SN - 1932-6203
VL - 13
JO - PLoS ONE
JF - PLoS ONE
IS - 4
M1 - e0195344
ER -