TY - GEN
T1 - Towards Human-Like Educational Question Generation with Large Language Models
AU - Wang, Zichao
AU - Valdez, Jakob
AU - Basu Mallick, Debshila
AU - Baraniuk, Richard G.
N1 - Funding Information:
Acknowledgements. This work is supported by NSF grants 1842378, 1917713, 2118706, ONR grant N0014-20-1-2534, AFOSR grant FA9550-18-1-0478, and a Vannevar Bush Faculty Fellowship, ONR grant N00014-18-1-2047. We thank Prof. Sandra Adams (Excelsior College), Prof. Tyler Rust (California State University), Prof. Julie Dinh (Baruch College, CUNY) for contributing their subject matter and instructional expertise. Thanks to the anonymous reviewers for thoughtful feedback on the manuscript.
Publisher Copyright:
© 2022, Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - We investigate the utility of large pretrained language models (PLMs) for automatic educational assessment question generation. While PLMs have shown increasing promise in a wide range of natural language applications, including question generation, they can generate unreliable and undesirable content. For high-stakes applications such as educational assessments, it is not only critical to ensure that the generated content is of high quality but also relates to the specific content being assessed. In this paper, we investigate the impact of various PLM prompting strategies on the quality of generated questions. We design a series of generation scenarios to evaluate various generation strategies and evaluate generated questions via automatic metrics and manual examination. With empirical evaluation, we identify the prompting strategy that is most likely to lead to high-quality generated questions. Finally, we demonstrate the promising educational utility of generated questions using our concluded best generation strategy by presenting generated questions together with human-authored questions to a subject matter expert, who despite their expertise, could not effectively distinguish between generated and human-authored questions.
AB - We investigate the utility of large pretrained language models (PLMs) for automatic educational assessment question generation. While PLMs have shown increasing promise in a wide range of natural language applications, including question generation, they can generate unreliable and undesirable content. For high-stakes applications such as educational assessments, it is not only critical to ensure that the generated content is of high quality but also relates to the specific content being assessed. In this paper, we investigate the impact of various PLM prompting strategies on the quality of generated questions. We design a series of generation scenarios to evaluate various generation strategies and evaluate generated questions via automatic metrics and manual examination. With empirical evaluation, we identify the prompting strategy that is most likely to lead to high-quality generated questions. Finally, we demonstrate the promising educational utility of generated questions using our concluded best generation strategy by presenting generated questions together with human-authored questions to a subject matter expert, who despite their expertise, could not effectively distinguish between generated and human-authored questions.
UR - http://www.scopus.com/inward/record.url?scp=85135953212&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85135953212&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-11644-5_13
DO - 10.1007/978-3-031-11644-5_13
M3 - Conference contribution
AN - SCOPUS:85135953212
SN - 9783031116438
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 153
EP - 166
BT - Artificial Intelligence in Education - 23rd International Conference, AIED 2022, Proceedings
A2 - Rodrigo, Maria Mercedes
A2 - Matsuda, Noburu
A2 - Cristea, Alexandra I.
A2 - Dimitrova, Vania
PB - Springer Science and Business Media Deutschland GmbH
T2 - 23rd International Conference on Artificial Intelligence in Education, AIED 2022
Y2 - 27 July 2022 through 31 July 2022
ER -