TY - GEN
T1 - Interpretability in healthcare a comparative study of local machine learning interpretability techniques
AU - Elshawi, Radwa
AU - Sherif, Youssef
AU - Al-Mallah, Mouaz
AU - Sakr, Sherif
N1 - Funding Information:
The work of Radwa Elshawi is funded by the European Regional Development Funds via the Mobilitas Plus programme (MOBJD341). The work of Sherif Sakr is funded by the European Regional Development Funds via the Mobilitas Plus programme (grant MOBTT75).
Publisher Copyright:
© 2019 IEEE.
PY - 2019/6
Y1 - 2019/6
N2 - Although complex machine learning models (e.g., Random Forest, Neural Networks) are commonly outperforming the traditional simple interpretable models (e.g., Linear Regression, Decision Tree), in the healthcare domain, clinicians find it hard to understand and trust these complex models due to the lack of intuition and explanation of their predictions. With the new General Data Protection Regulation (GDPR), the importance for plausibility and verifiability of the predictions made by machine learning models has become essential. To tackle this challenge, recently, several machine learning interpretability techniques have been developed and introduced. In general, the main aim of these interpretability techniques is to shed light and provide insights into the predictions process of the machine learning models and explain how the model predictions have resulted. However, in practice, assessing the quality of the explanations provided by the various interpretability techniques is still questionable. In this paper, we present a comprehensive experimental evaluation of three recent and popular local model agnostic interpretability techniques, namely, LIME, SHAP and Anchors on different types of real-world healthcare data. Our experimental evaluation covers different aspects for its comparison including identity, stability, separability, similarity, execution time and bias detection. The results of our experiments show that LIME achieves the lowest performance for the identity metric and the highest performance for the separability metric across all datasets included in this study. On average, SHAP has the smallest average time to output explanation across all datasets included in this study. For detecting the bias, SHAP enables the participants to better detect the bias.
AB - Although complex machine learning models (e.g., Random Forest, Neural Networks) are commonly outperforming the traditional simple interpretable models (e.g., Linear Regression, Decision Tree), in the healthcare domain, clinicians find it hard to understand and trust these complex models due to the lack of intuition and explanation of their predictions. With the new General Data Protection Regulation (GDPR), the importance for plausibility and verifiability of the predictions made by machine learning models has become essential. To tackle this challenge, recently, several machine learning interpretability techniques have been developed and introduced. In general, the main aim of these interpretability techniques is to shed light and provide insights into the predictions process of the machine learning models and explain how the model predictions have resulted. However, in practice, assessing the quality of the explanations provided by the various interpretability techniques is still questionable. In this paper, we present a comprehensive experimental evaluation of three recent and popular local model agnostic interpretability techniques, namely, LIME, SHAP and Anchors on different types of real-world healthcare data. Our experimental evaluation covers different aspects for its comparison including identity, stability, separability, similarity, execution time and bias detection. The results of our experiments show that LIME achieves the lowest performance for the identity metric and the highest performance for the separability metric across all datasets included in this study. On average, SHAP has the smallest average time to output explanation across all datasets included in this study. For detecting the bias, SHAP enables the participants to better detect the bias.
KW - Black-Box Model
KW - Machine Learning
KW - Machine Learning Interpretability
KW - Model-Agnostic Interpretability
UR - http://www.scopus.com/inward/record.url?scp=85071000896&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85071000896&partnerID=8YFLogxK
U2 - 10.1109/CBMS.2019.00065
DO - 10.1109/CBMS.2019.00065
M3 - Conference contribution
AN - SCOPUS:85071000896
T3 - Proceedings - IEEE Symposium on Computer-Based Medical Systems
SP - 275
EP - 280
BT - Proceedings - 2019 IEEE 32nd International Symposium on Computer-Based Medical Systems, CBMS 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 32nd IEEE International Symposium on Computer-Based Medical Systems, CBMS 2019
Y2 - 5 June 2019 through 7 June 2019
ER -