TY - JOUR
T1 - ECG Semantic Integrator (ESI)
T2 - A Foundation ECG Model Pretrained with LLM-Enhanced Cardiological Text
AU - Yu, Han
AU - Guo, Peikun
AU - Sano, Akane
N1 - Publisher Copyright:
© 2024, Transactions on Machine Learning Research. All rights reserved.
PY - 2024
Y1 - 2024
N2 - The utilization of deep learning on electrocardiogram (ECG) analysis has brought the advanced accuracy and efficiency of cardiac healthcare diagnostics. In this work, we address a critical challenge in the field of ECG analysis with deep learning: learning robust representation without large-scale labeled datasets. We propose ECG Semantic Integrator (ESI), a novel multimodal contrastive pretraining framework that jointly learns from ECG signals and associated textual descriptions. ESI employs a dual objective function that comprises a contrastive loss and a captioning loss to develop representations of ECG data. To create a sufficiently large and diverse training dataset, we develop a retrieval-augmented generation (RAG)-based Large Language Model (LLM) pipeline, called Cardio Query Assistant (CQA). This pipeline is designed to generate detailed textual descriptions for ECGs from diverse databases. The generated text includes information about demographics and wave-form patterns. This approach enables us to compile a large-scale multimodal dataset with over 660,000 ECG-text pairs for pretraining ESI, which then learns robust and generaliz-able representations of 12-lead ECG. We validate our approach through various downstream tasks, including arrhythmia detection and ECG-based subject identification. Our experi-mental results demonstrate substantial improvements over strong baselines in these tasks. These baselines encompass supervised and self-supervised learning methods, as well as prior multimodal pretraining approaches. Our work shows the potential of combining multimodal pretraining to improve the analysis of ECG signals.
AB - The utilization of deep learning on electrocardiogram (ECG) analysis has brought the advanced accuracy and efficiency of cardiac healthcare diagnostics. In this work, we address a critical challenge in the field of ECG analysis with deep learning: learning robust representation without large-scale labeled datasets. We propose ECG Semantic Integrator (ESI), a novel multimodal contrastive pretraining framework that jointly learns from ECG signals and associated textual descriptions. ESI employs a dual objective function that comprises a contrastive loss and a captioning loss to develop representations of ECG data. To create a sufficiently large and diverse training dataset, we develop a retrieval-augmented generation (RAG)-based Large Language Model (LLM) pipeline, called Cardio Query Assistant (CQA). This pipeline is designed to generate detailed textual descriptions for ECGs from diverse databases. The generated text includes information about demographics and wave-form patterns. This approach enables us to compile a large-scale multimodal dataset with over 660,000 ECG-text pairs for pretraining ESI, which then learns robust and generaliz-able representations of 12-lead ECG. We validate our approach through various downstream tasks, including arrhythmia detection and ECG-based subject identification. Our experi-mental results demonstrate substantial improvements over strong baselines in these tasks. These baselines encompass supervised and self-supervised learning methods, as well as prior multimodal pretraining approaches. Our work shows the potential of combining multimodal pretraining to improve the analysis of ECG signals.
UR - http://www.scopus.com/inward/record.url?scp=85218453429&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85218453429&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:85218453429
SN - 2835-8856
VL - 2024
JO - Transactions on Machine Learning Research
JF - Transactions on Machine Learning Research
ER -