TY - JOUR
T1 - A survey of transformers and large language models for ECG diagnosis
T2 - advances, challenges, and future directions
AU - Ansari, Mohammed Yusuf
AU - Yaqoob, Mohammed
AU - Ishaq, Mohammed
AU - Flushing, Eduardo Feo
AU - Mangalote, Iffa Afsa changaai
AU - Dakua, Sarada Prasad
AU - Aboumarzouk, Omar
AU - Righetti, Raffaella
AU - Qaraqe, Marwa
N1 - Publisher Copyright:
© The Author(s) 2025.
PY - 2025/6/4
Y1 - 2025/6/4
N2 - Electrocardiograms (ECGs) are widely utilized in clinical practice as a non-invasive diagnostic tool for detecting cardiovascular diseases. Convolutional neural networks (CNNs) have been the primary choice for ECG analysis due to their capability to process raw signals. However, their localized convolutional operations limit the ability to capture long-range temporal dependencies across heartbeats, impeding a comprehensive cardiovascular assessment. To address these limitations, transformer-based frameworks have been introduced, employing self-attention mechanisms to effectively model complex temporal patterns over entire ECG sequences. Recent advancements in large language models (LLMs) have further expanded the utility of transformers by enabling multimodal integration and facilitating zero-shot diagnosis, thereby enhancing the scope of ECG-based clinical applications. Despite the increasing adoption of these methodologies, a comprehensive survey systematically examining transformer and LLM-based approaches for ECG analysis is absent from the literature. Consequently, this article surveys existing methods and proposes a novel hierarchical taxonomy based on the complexity of diagnosis, ranging from single-beat analysis to multi-beat and full-length signal evaluations. A thorough cross-category comparison is performed to highlight overarching commonalities and limitations. In light of these limitations, the paper presents a discussion of critical gaps and introduces new future directions aimed at improving ECG representation, enhancing positional encodings, refining self-attention architectures, and addressing challenges related to hallucinations and confidence measures in LLMs. The insights and guidelines presented aim to inform future research and clinical practices, enabling the next generation of intelligent ECG diagnostic systems.
AB - Electrocardiograms (ECGs) are widely utilized in clinical practice as a non-invasive diagnostic tool for detecting cardiovascular diseases. Convolutional neural networks (CNNs) have been the primary choice for ECG analysis due to their capability to process raw signals. However, their localized convolutional operations limit the ability to capture long-range temporal dependencies across heartbeats, impeding a comprehensive cardiovascular assessment. To address these limitations, transformer-based frameworks have been introduced, employing self-attention mechanisms to effectively model complex temporal patterns over entire ECG sequences. Recent advancements in large language models (LLMs) have further expanded the utility of transformers by enabling multimodal integration and facilitating zero-shot diagnosis, thereby enhancing the scope of ECG-based clinical applications. Despite the increasing adoption of these methodologies, a comprehensive survey systematically examining transformer and LLM-based approaches for ECG analysis is absent from the literature. Consequently, this article surveys existing methods and proposes a novel hierarchical taxonomy based on the complexity of diagnosis, ranging from single-beat analysis to multi-beat and full-length signal evaluations. A thorough cross-category comparison is performed to highlight overarching commonalities and limitations. In light of these limitations, the paper presents a discussion of critical gaps and introduces new future directions aimed at improving ECG representation, enhancing positional encodings, refining self-attention architectures, and addressing challenges related to hallucinations and confidence measures in LLMs. The insights and guidelines presented aim to inform future research and clinical practices, enabling the next generation of intelligent ECG diagnostic systems.
KW - Arrhythmia
KW - ECG representation
KW - Hallucination
KW - Large language models
KW - Myocardial infarction
KW - Positional encoding
KW - Self-attention architecture
KW - Single-beat and multi-beat analysis
KW - Sleep apnea
KW - Zero-shot diagnosis
UR - http://www.scopus.com/inward/record.url?scp=105007223043&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=105007223043&partnerID=8YFLogxK
U2 - 10.1007/s10462-025-11259-x
DO - 10.1007/s10462-025-11259-x
M3 - Article
AN - SCOPUS:105007223043
SN - 0269-2821
VL - 58
JO - Artificial Intelligence Review
JF - Artificial Intelligence Review
IS - 9
M1 - 261
ER -