TY - JOUR
T1 - RefAI
T2 - a GPT-powered retrieval-augmented generative tool for biomedical literature recommendation and summarization
AU - Li, Yiming
AU - Zhao, Jeff
AU - Li, Manqi
AU - Dang, Yifang
AU - Yu, Evan
AU - Li, Jianfu
AU - Sun, Zenan
AU - Hussein, Usama
AU - Wen, Jianguo
AU - Abdelhameed, Ahmed M.
AU - Mai, Junhua
AU - Li, Shenduo
AU - Yu, Yue
AU - Hu, Xinyue
AU - Yang, Daowei
AU - Feng, Jingna
AU - Li, Zehan
AU - He, Jianping
AU - Tao, Wei
AU - Duan, Tiehang
AU - Lou, Yanyan
AU - Li, Fang
AU - Tao, Cui
N1 - Publisher Copyright:
# The Author(s) 2024. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved.
PY - 2024/9/1
Y1 - 2024/9/1
N2 - Objectives: Precise literature recommendation and summarization are crucial for biomedical professionals. While the latest iteration of generative pretrained transformer (GPT) incorporates 2 distinct modes—real-time search and pretrained model utilization—it encounters challenges in dealing with these tasks. Specifically, the real-time search can pinpoint some relevant articles but occasionally provides fabricated papers, whereas the pretrained model excels in generating well-structured summaries but struggles to cite specific sources. In response, this study introduces RefAI, an innovative retrieval-augmented generative tool designed to synergize the strengths of large language models (LLMs) while overcoming their limitations. Materials and Methods: RefAI utilized PubMed for systematic literature retrieval, employed a novel multivariable algorithm for article recommendation, and leveraged GPT-4 turbo for summarization. Ten queries under 2 prevalent topics (“cancer immunotherapy and target therapy” and “LLMs in medicine”) were chosen as use cases and 3 established counterparts (ChatGPT-4, ScholarAI, and Gemini) as our baselines. The evaluation was conducted by 10 domain experts through standard statistical analyses for performance comparison. Results: The overall performance of RefAI surpassed that of the baselines across 5 evaluated dimensions—relevance and quality for literature recommendation, accuracy, comprehensiveness, and reference integration for summarization, with the majority exhibiting statistically significant improvements (P-values <.05). Discussion: RefAI demonstrated substantial improvements in literature recommendation and summarization over existing tools, addressing issues like fabricated papers, metadata inaccuracies, restricted recommendations, and poor reference integration. Conclusion: By augmenting LLM with external resources and a novel ranking algorithm, RefAI is uniquely capable of recommending high-quality literature and generating well-structured summaries, holding the potential to meet the critical needs of biomedical professionals in navigating and synthesizing vast amounts of scientific literature.
AB - Objectives: Precise literature recommendation and summarization are crucial for biomedical professionals. While the latest iteration of generative pretrained transformer (GPT) incorporates 2 distinct modes—real-time search and pretrained model utilization—it encounters challenges in dealing with these tasks. Specifically, the real-time search can pinpoint some relevant articles but occasionally provides fabricated papers, whereas the pretrained model excels in generating well-structured summaries but struggles to cite specific sources. In response, this study introduces RefAI, an innovative retrieval-augmented generative tool designed to synergize the strengths of large language models (LLMs) while overcoming their limitations. Materials and Methods: RefAI utilized PubMed for systematic literature retrieval, employed a novel multivariable algorithm for article recommendation, and leveraged GPT-4 turbo for summarization. Ten queries under 2 prevalent topics (“cancer immunotherapy and target therapy” and “LLMs in medicine”) were chosen as use cases and 3 established counterparts (ChatGPT-4, ScholarAI, and Gemini) as our baselines. The evaluation was conducted by 10 domain experts through standard statistical analyses for performance comparison. Results: The overall performance of RefAI surpassed that of the baselines across 5 evaluated dimensions—relevance and quality for literature recommendation, accuracy, comprehensiveness, and reference integration for summarization, with the majority exhibiting statistically significant improvements (P-values <.05). Discussion: RefAI demonstrated substantial improvements in literature recommendation and summarization over existing tools, addressing issues like fabricated papers, metadata inaccuracies, restricted recommendations, and poor reference integration. Conclusion: By augmenting LLM with external resources and a novel ranking algorithm, RefAI is uniquely capable of recommending high-quality literature and generating well-structured summaries, holding the potential to meet the critical needs of biomedical professionals in navigating and synthesizing vast amounts of scientific literature.
KW - generative pretrained transformer
KW - large language model
KW - literature recommendation
KW - retrieval-augmented generation
KW - text summarization
UR - http://www.scopus.com/inward/record.url?scp=85198666791&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85198666791&partnerID=8YFLogxK
U2 - 10.1093/jamia/ocae129
DO - 10.1093/jamia/ocae129
M3 - Article
C2 - 38857454
AN - SCOPUS:85198666791
SN - 1067-5027
VL - 31
SP - 2030
EP - 2039
JO - Journal of the American Medical Informatics Association
JF - Journal of the American Medical Informatics Association
IS - 9
ER -