TY - GEN
T1 - Automated Scoring for Reading Comprehension via In-context BERT Tuning
AU - Fernandez, Nigel
AU - Ghosh, Aritra
AU - Liu, Naiming
AU - Wang, Zichao
AU - Choffin, Benoît
AU - Baraniuk, Richard
AU - Lan, Andrew
N1 - Publisher Copyright:
© 2022, Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - Automated scoring of open-ended student responses has the potential to significantly reduce human grader effort. Recent advances in automated scoring leverage textual representations from pre-trained language models like BERT. Existing approaches train a separate model for each item/question, suitable for scenarios like essay scoring where items can be different from one another. However, these approaches have two limitations: 1) they fail to leverage item linkage for scenarios such as reading comprehension where multiple items may share a reading passage; 2) they are not scalable since storing one model per item is difficult with large language models. We report our (grand prize-winning) solution to the National Assessment of Education Progress (NAEP) automated scoring challenge for reading comprehension. Our approach, in-context BERT fine-tuning, produces a single shared scoring model for all items with a carefully designed input structure to provide contextual information on each item. Our experiments demonstrate the effectiveness of our approach which outperforms existing methods. We also perform a qualitative analysis and discuss the limitations of our approach. (Full version of the paper can be found at: https://arxiv.org/abs/2205.09864 Our implementation can be found at: https://github.com/ni9elf/automated-scoring
AB - Automated scoring of open-ended student responses has the potential to significantly reduce human grader effort. Recent advances in automated scoring leverage textual representations from pre-trained language models like BERT. Existing approaches train a separate model for each item/question, suitable for scenarios like essay scoring where items can be different from one another. However, these approaches have two limitations: 1) they fail to leverage item linkage for scenarios such as reading comprehension where multiple items may share a reading passage; 2) they are not scalable since storing one model per item is difficult with large language models. We report our (grand prize-winning) solution to the National Assessment of Education Progress (NAEP) automated scoring challenge for reading comprehension. Our approach, in-context BERT fine-tuning, produces a single shared scoring model for all items with a carefully designed input structure to provide contextual information on each item. Our experiments demonstrate the effectiveness of our approach which outperforms existing methods. We also perform a qualitative analysis and discuss the limitations of our approach. (Full version of the paper can be found at: https://arxiv.org/abs/2205.09864 Our implementation can be found at: https://github.com/ni9elf/automated-scoring
KW - Automated scoring
KW - BERT
KW - Reading comprehension
UR - http://www.scopus.com/inward/record.url?scp=85135836946&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85135836946&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-11644-5_69
DO - 10.1007/978-3-031-11644-5_69
M3 - Conference contribution
AN - SCOPUS:85135836946
SN - 9783031116438
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 691
EP - 697
BT - Artificial Intelligence in Education - 23rd International Conference, AIED 2022, Proceedings
A2 - Rodrigo, Maria Mercedes
A2 - Matsuda, Noburu
A2 - Cristea, Alexandra I.
A2 - Dimitrova, Vania
PB - Springer Science and Business Media Deutschland GmbH
T2 - 23rd International Conference on Artificial Intelligence in Education, AIED 2022
Y2 - 27 July 2022 through 31 July 2022
ER -