TY - GEN
T1 - Training LLM-Based Tutors to Improve Student Learning Outcomes in Dialogues
AU - Scarlatos, Alexander
AU - Liu, Naiming
AU - Lee, Jaewook
AU - Baraniuk, Richard
AU - Lan, Andrew
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - Generative artificial intelligence (AI) has the potential to scale up personalized tutoring through large language models (LLMs), with recent works focusing on training or prompting LLMs to follow effective pedagogical principles. However, these models are not trained to maximize student learning throughout the course of a dialogue, so may engage with students in a suboptimal way. We address this limitation by introducing an approach to train LLMs to generate tutor utterances that maximize the likelihood of student correctness, while still encouraging the model to follow good pedagogical practice. Specifically, we generate a set of candidate tutor utterances and score them using (1) an LLM-based student model to predict the chance of correct student responses and (2) a pedagogical rubric evaluated by GPT-4o. We then use the resulting data to train an open-source LLM, Llama 3.1 8B, using direct preference optimization (DPO). We show that tutor utterances generated by our model lead to significantly higher chances of correct student responses while maintaining the pedagogical quality of GPT-4o. We also conduct qualitative analyses and a human evaluation to demonstrate that our model generates high quality tutor utterances. (This work is partially supported by Renaissance Philanthropy via the learning engineering virtual institute (LEVI) and NSF grants 2118706, 2237676, and 2341948.) (Our code is available at https://github.com/umass-ml4ed/tutorbot-dpo.)
AB - Generative artificial intelligence (AI) has the potential to scale up personalized tutoring through large language models (LLMs), with recent works focusing on training or prompting LLMs to follow effective pedagogical principles. However, these models are not trained to maximize student learning throughout the course of a dialogue, so may engage with students in a suboptimal way. We address this limitation by introducing an approach to train LLMs to generate tutor utterances that maximize the likelihood of student correctness, while still encouraging the model to follow good pedagogical practice. Specifically, we generate a set of candidate tutor utterances and score them using (1) an LLM-based student model to predict the chance of correct student responses and (2) a pedagogical rubric evaluated by GPT-4o. We then use the resulting data to train an open-source LLM, Llama 3.1 8B, using direct preference optimization (DPO). We show that tutor utterances generated by our model lead to significantly higher chances of correct student responses while maintaining the pedagogical quality of GPT-4o. We also conduct qualitative analyses and a human evaluation to demonstrate that our model generates high quality tutor utterances. (This work is partially supported by Renaissance Philanthropy via the learning engineering virtual institute (LEVI) and NSF grants 2118706, 2237676, and 2341948.) (Our code is available at https://github.com/umass-ml4ed/tutorbot-dpo.)
KW - Large Language Models
KW - Math Education
KW - Reinforcement Learning
KW - Tutor-Student Dialogues
UR - https://www.scopus.com/pages/publications/105012033737
UR - https://www.scopus.com/inward/citedby.url?scp=105012033737&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-98414-3_18
DO - 10.1007/978-3-031-98414-3_18
M3 - Conference contribution
AN - SCOPUS:105012033737
SN - 9783031984136
T3 - Lecture Notes in Computer Science
SP - 251
EP - 266
BT - Artificial Intelligence in Education - 26th International Conference, AIED 2025, Proceedings
A2 - Cristea, Alexandra I.
A2 - Walker, Erin
A2 - Lu, Yu
A2 - Santos, Olga C.
A2 - Isotani, Seiji
PB - Springer Science and Business Media Deutschland GmbH
T2 - 26th International Conference on Artificial Intelligence in Education, AIED 2025
Y2 - 22 July 2025 through 26 July 2025
ER -