Comparison of large language models in oral and maxillofacial surgery

Ricardo Grillo, Alexandre Hugo Llanos, Claudio Costa, Fernando Melhem-Elias

Research output: Contribution to journalArticlepeer-review

Abstract

This study evaluates the performance of six large language models (LLMs) in generating content relevant to oral and maxillofacial surgery (OMFS), focusing on their ability to provide accurate, comprehensive, and relevant information across five specific tasks. Each LLM was assessed based on its responses to five prompts: (1) postoperative instructions for third molar surgery; (2) a list of best-selling books on orthognathic surgery; (3) the most cited articles in OMFS; (4) novel ideas for systematic reviews; and (5) emerging trends in OMFS. Responses were scored for relevance, comprehensiveness, and accuracy using predefined criteria. Statistical analysis was performed using the Kruskal-Wallis test to compare tool performance. The LLMs performed similarly overall, with varying strengths and weaknesses. For postoperative instructions, they all provided comparable recommendations, though Perplexity underperformed. In identifying best-selling books, Gemini and Perplexity excelled, while ChatGPT and Copilot struggled with retrieving highly cited articles. Copilot and Claude were more effective in suggesting novel systematic review topics, while ChatGPT, Claude, Copilot, and DeepSeek identified emerging trends most accurately. LLMs demonstrate significant potential in supporting OMFS-related tasks, but their performance varies depending on the specific application. While they excel at synthesising existing information and identifying trends, limitations in accuracy and occasional hallucinations highlight the need for human oversight. These findings underscore the importance of integrating artificial intelligence (AI) as a supplementary tool in clinical, academic, and research settings, ensuring its use complements, rather than replaces, human expertise.

Original languageEnglish (US)
JournalBritish Journal of Oral and Maxillofacial Surgery
DOIs
StateAccepted/In press - 2025

Keywords

  • artificial intelligence
  • maxillofacial injuries
  • orthognathic surgery
  • surgery, oral
  • third molar surgery

ASJC Scopus subject areas

  • Surgery
  • Oral Surgery
  • Otorhinolaryngology

Fingerprint

Dive into the research topics of 'Comparison of large language models in oral and maxillofacial surgery'. Together they form a unique fingerprint.

Cite this