TY - JOUR
T1 - EEG interpretation reliability and interpreter confidence
T2 - A large single-center study
AU - Grant, Arthur C.
AU - Abdel-Baki, Samah G.
AU - Weedon, Jeremy
AU - Arnedo, Vanessa
AU - Chari, Geetha
AU - Koziorynska, Ewa
AU - Lushbough, Catherine
AU - Maus, Douglas
AU - McSween, Tresa
AU - Mortati, Katherine A.
AU - Reznikov, Alexandra
AU - Omurtag, Ahmet
N1 - Funding Information:
This study was supported by NIH 1RC3NS070658 to Bio-Signal Group Corp. with a subcontract to SUNY Downstate Medical Center. The authors thank Madeleine Coleman for her assistance with the manuscript preparation.
Funding Information:
Dr. Grant receives research support from the NIH, New York State, and UCB BioSciences. He serves on a professional advisory board to BioSignal Group, Inc. (BSG). All income derived from this position is donated directly from BSG to the Downstate College of Medicine Foundation.
PY - 2014/3
Y1 - 2014/3
N2 - The intrarater and interrater reliability (I&IR) of EEG interpretation has significant implications for the value of EEG as a diagnostic tool. We measured both the intrarater reliability and the interrater reliability of EEG interpretation based on the interpretation of complete EEGs into standard diagnostic categories and rater confidence in their interpretations and investigated sources of variance in EEG interpretations. During two distinct time intervals, six board-certified clinical neurophysiologists classified 300 EEGs into one or more of seven diagnostic categories and assigned a subjective confidence to their interpretations. Each EEG was read by three readers. Each reader interpreted 150 unique studies, and 50 studies were re-interpreted to generate intrarater data. A generalizability study assessed the contribution of subjects, readers, and the interaction between subjects and readers to interpretation variance. Five of the six readers had a median confidence of ≥99%, and the upper quartile of confidence values was 100% for all six readers. Intrarater Cohen's kappa (κc) ranged from 0.33 to 0.73 with an aggregated value of 0.59. Cohen's kappa ranged from 0.29 to 0.62 for the 15 reader pairs, with an aggregated Fleiss kappa of 0.44 for interrater agreement. Cohen's kappa was not significantly different across rater pairs (chi-square=17.3, df=14, p=0.24). Variance due to subjects (i.e., EEGs) was 65.3%, due to readers was 3.9%, and due to the interaction between readers and subjects was 30.8%. Experienced epileptologists have very high confidence in their EEG interpretations and low to moderate I&IR, a common paradox in clinical medicine. A necessary, but insufficient, condition to improve EEG interpretation accuracy is to increase intrarater and interrater reliability. This goal could be accomplished, for instance, with an automated online application integrated into a continuing medical education module that measures and reports EEG I&IR to individual users.
AB - The intrarater and interrater reliability (I&IR) of EEG interpretation has significant implications for the value of EEG as a diagnostic tool. We measured both the intrarater reliability and the interrater reliability of EEG interpretation based on the interpretation of complete EEGs into standard diagnostic categories and rater confidence in their interpretations and investigated sources of variance in EEG interpretations. During two distinct time intervals, six board-certified clinical neurophysiologists classified 300 EEGs into one or more of seven diagnostic categories and assigned a subjective confidence to their interpretations. Each EEG was read by three readers. Each reader interpreted 150 unique studies, and 50 studies were re-interpreted to generate intrarater data. A generalizability study assessed the contribution of subjects, readers, and the interaction between subjects and readers to interpretation variance. Five of the six readers had a median confidence of ≥99%, and the upper quartile of confidence values was 100% for all six readers. Intrarater Cohen's kappa (κc) ranged from 0.33 to 0.73 with an aggregated value of 0.59. Cohen's kappa ranged from 0.29 to 0.62 for the 15 reader pairs, with an aggregated Fleiss kappa of 0.44 for interrater agreement. Cohen's kappa was not significantly different across rater pairs (chi-square=17.3, df=14, p=0.24). Variance due to subjects (i.e., EEGs) was 65.3%, due to readers was 3.9%, and due to the interaction between readers and subjects was 30.8%. Experienced epileptologists have very high confidence in their EEG interpretations and low to moderate I&IR, a common paradox in clinical medicine. A necessary, but insufficient, condition to improve EEG interpretation accuracy is to increase intrarater and interrater reliability. This goal could be accomplished, for instance, with an automated online application integrated into a continuing medical education module that measures and reports EEG I&IR to individual users.
KW - Confidence
KW - EEG
KW - Interrater reliability
KW - Intrarater reliability
UR - http://www.scopus.com/inward/record.url?scp=84894080278&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84894080278&partnerID=8YFLogxK
U2 - 10.1016/j.yebeh.2014.01.011
DO - 10.1016/j.yebeh.2014.01.011
M3 - Article
C2 - 24531133
AN - SCOPUS:84894080278
SN - 1525-5050
VL - 32
SP - 102
EP - 107
JO - Epilepsy and Behavior
JF - Epilepsy and Behavior
ER -