I am excited to have received $40,000 from the Schmidt Futures as the PI to develop a benchmark dataset of dialogic questioning. This dataset represents high-quality dialogic questions that adults can ask children during storybook reading, thus serving as the “groundtrue” model for machine learning algorithms to generate new questions with similar quality. The dataset could potentially help automate the process of developing high quality dialogic learning resources.

Compared to other existing datasets, our dataset is particularly valuable and relevant for supporting children’s language development, as our questions are grounded in evidence-based frameworks of narrative comprehension skills. The questions in our dataset cover seven narrative elements that are crucial for comprehension: character, setting, feeling, action, causal relationship, outcome resolution, and prediction. We have pilot tested a subset of our QA pairs among 120 children and achieved excellent psychometric properties (i.e., high validity and reliability, see our previous study).

The dataset will be made publicly available by December 2021