TY - GEN
T1 - GraphSeqLM
T2 - 34th ACM Web Conference, WWW Companion 2025
AU - Zhang, Heming
AU - Huang, Di
AU - Chen, Yixin
AU - Li, Fuhai
N1 - Publisher Copyright:
© 2025 Copyright held by the owner/author(s). Publication rights licensed to ACM.
PY - 2025/5/23
Y1 - 2025/5/23
N2 - The integration of multi-omic data is pivotal for understanding complex diseases, but its high dimensionality and noise present significant challenges. Graph Neural Networks (GNNs) offer a robust framework for analyzing large-scale signaling pathways and protein-protein interaction networks, yet they face limitations in expressivity when capturing intricate biological relationships. To address this, we propose Graph Sequence Language Model (GraphSeqLM), a framework that enhances GNNs with biological sequence embeddings generated by Large Language Models (LLMs). These embeddings encode structural and biological properties of DNA, RNA, and proteins, augmenting GNNs with enriched features for analyzing sample-specific multi-omic data. By integrating topological, sequence-derived, and biological information, GraphSeqLM demonstrates superior predictive accuracy and outperforms existing methods, paving the way for more effective multi-omic data integration in precision medicine.
AB - The integration of multi-omic data is pivotal for understanding complex diseases, but its high dimensionality and noise present significant challenges. Graph Neural Networks (GNNs) offer a robust framework for analyzing large-scale signaling pathways and protein-protein interaction networks, yet they face limitations in expressivity when capturing intricate biological relationships. To address this, we propose Graph Sequence Language Model (GraphSeqLM), a framework that enhances GNNs with biological sequence embeddings generated by Large Language Models (LLMs). These embeddings encode structural and biological properties of DNA, RNA, and proteins, augmenting GNNs with enriched features for analyzing sample-specific multi-omic data. By integrating topological, sequence-derived, and biological information, GraphSeqLM demonstrates superior predictive accuracy and outperforms existing methods, paving the way for more effective multi-omic data integration in precision medicine.
KW - Biological Sequences
KW - Graph Neural Networks
KW - Large Language Models
KW - Multi-omic Data
KW - Precision Medicine
UR - https://www.scopus.com/pages/publications/105009217112
UR - https://www.scopus.com/inward/citedby.url?scp=105009217112&partnerID=8YFLogxK
U2 - 10.1145/3701716.3715503
DO - 10.1145/3701716.3715503
M3 - Conference contribution
AN - SCOPUS:105009217112
T3 - WWW Companion 2025 - Companion Proceedings of the ACM Web Conference 2025
SP - 1510
EP - 1513
BT - WWW Companion 2025 - Companion Proceedings of the ACM Web Conference 2025
PB - Association for Computing Machinery
Y2 - 28 April 2025 through 2 May 2025
ER -