Faculty: Marti A. Hearst
Students: Katie Stasaski (PhD)

We are interested in leveraging large neural dialogue models for foreign language tutoring. To this end, we construct a new dataset, CIMA, which was created using crowdworkers playing the role of students and tutors. Novel features of the dataset include two overlapping domains of different difficulties, grounding tutoring conversation in an image, tutor and student responses coded with actions, and multiple human-generated tutoring responses for each point in the student conversation. Furthering this work, we develop a novel methodology to collect linguistically diverse dialogue responses from crowdworkers, allowing for future corpora to include creative tutoring responses and represent more diverse utterances.

Publications

Stasaski, K., Kao, K., and Hearst, M.A., Construction of a Large Open Access Dialogue Dataset for Tutoring, BEA Workshop, ACL 2020.
Stasaski, K., Yang, G., and Hearst, M.A., More Diverse Dialogue Datasets via Diversity-Informed Data Collection, ACL 2020.