TY - JOUR
T1 - Association of reviewer experience with discriminating human-written versus ChatGPT-written abstracts
AU - Levin, Gabriel
AU - Pareja, Rene
AU - Viveros-Carreño, David
AU - Diaz, Emmanuel Sanchez
AU - Yates, Elise Mann
AU - Zand, Behrouz
AU - Ramirez, Pedro T.
N1 - Publisher Copyright:
© 2024 BMJ Publishing Group. All rights reserved.
PY - 2024/5/1
Y1 - 2024/5/1
N2 - Objective To determine if reviewer experience impacts the ability to discriminate between human-written and ChatGPT-written abstracts. Methods Thirty reviewers (10 seniors, 10 juniors, and 10 residents) were asked to differentiate between 10 ChatGPT-written and 10 human-written (fabricated) abstracts. For the study, 10 gynecologic oncology abstracts were fabricated by the authors. For each human-written abstract we generated a ChatGPT matching abstract by using the same title and the fabricated results of each of the human generated abstracts. A web-based questionnaire was used to gather demographic data and to record the reviewers’ evaluation of the 20 abstracts. Comparative statistics and multivariable regression were used to identify factors associated with a higher correct identification rate. Results The 30 reviewers discriminated 20 abstracts, giving a total of 600 abstract evaluations. The reviewers were able to correctly identify 300/600 (50%) of the abstracts: 139/300 (46.3%) of the ChatGPT-generated abstracts and 161/300 (53.7%) of the human-written abstracts (p=0.07). Human-written abstracts had a higher rate of correct identification (median (IQR) 56.7% (49.2–64.1%) vs 45.0% (43.2–48.3%), p=0.023). Senior reviewers had a higher correct identification rate (60%) than junior reviewers and residents (45% each; p=0.043 and p=0.002, respectively). In a linear regression model including the experience level of the reviewers, familiarity with artificial intelligence (AI) and the country in which the majority of medical training was achieved (English speaking vs non-English speaking), the experience of the reviewer (β=10.2 (95% CI 1.8 to 18.7)) and familiarity with AI (β=7.78 (95% CI 0.6 to 15.0)) were independently associated with the correct identification rate (p=0.019 and p=0.035, respectively). In a correlation analysis the number of publications by the reviewer was positively correlated with the correct identification rate (r28)=0.61, p<0.001. Conclusion A total of 46.3% of abstracts written by ChatGPT were detected by reviewers. The correct identification rate increased with reviewer and publication experience.
AB - Objective To determine if reviewer experience impacts the ability to discriminate between human-written and ChatGPT-written abstracts. Methods Thirty reviewers (10 seniors, 10 juniors, and 10 residents) were asked to differentiate between 10 ChatGPT-written and 10 human-written (fabricated) abstracts. For the study, 10 gynecologic oncology abstracts were fabricated by the authors. For each human-written abstract we generated a ChatGPT matching abstract by using the same title and the fabricated results of each of the human generated abstracts. A web-based questionnaire was used to gather demographic data and to record the reviewers’ evaluation of the 20 abstracts. Comparative statistics and multivariable regression were used to identify factors associated with a higher correct identification rate. Results The 30 reviewers discriminated 20 abstracts, giving a total of 600 abstract evaluations. The reviewers were able to correctly identify 300/600 (50%) of the abstracts: 139/300 (46.3%) of the ChatGPT-generated abstracts and 161/300 (53.7%) of the human-written abstracts (p=0.07). Human-written abstracts had a higher rate of correct identification (median (IQR) 56.7% (49.2–64.1%) vs 45.0% (43.2–48.3%), p=0.023). Senior reviewers had a higher correct identification rate (60%) than junior reviewers and residents (45% each; p=0.043 and p=0.002, respectively). In a linear regression model including the experience level of the reviewers, familiarity with artificial intelligence (AI) and the country in which the majority of medical training was achieved (English speaking vs non-English speaking), the experience of the reviewer (β=10.2 (95% CI 1.8 to 18.7)) and familiarity with AI (β=7.78 (95% CI 0.6 to 15.0)) were independently associated with the correct identification rate (p=0.019 and p=0.035, respectively). In a correlation analysis the number of publications by the reviewer was positively correlated with the correct identification rate (r28)=0.61, p<0.001. Conclusion A total of 46.3% of abstracts written by ChatGPT were detected by reviewers. The correct identification rate increased with reviewer and publication experience.
UR - http://www.scopus.com/inward/record.url?scp=85191877064&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85191877064&partnerID=8YFLogxK
U2 - 10.1136/ijgc-2023-005162
DO - 10.1136/ijgc-2023-005162
M3 - Article
C2 - 38627032
AN - SCOPUS:85191877064
SN - 1048-891X
VL - 34
SP - 669
EP - 674
JO - International Journal of Gynecological Cancer
JF - International Journal of Gynecological Cancer
IS - 5
ER -