Completion of a large scale benchmark dataset of dialogic questions for reading comprehension

January 10, 2022

With funding from Schmidt Futures, I’m pleased to announce that my team has completed the development of a benchmark dataset of dialogic questions for children’s books. This dataset, FairytaleQA, consists of 10,000 from almost 300 public-domain books, covering seven types of narrative elements/relations. We believe that FairytaleQA will be valuable for the field of natural language processing and education. In particular, it will enable automatic generation of questions for reading instruction and assessment.

The advantages of FairytaleQA are as follows:

We employed education experts to generate question-answer pairs based on evidence-based narrative comprehension frameworks, thus increasing the validity and reliability of the assessment.
FairytaleQA contains both explicit questions that involve answers found directly in the text and implicit questions that require infer106 ence making and high-level summarization, thus representing a relatively balanced assessment with questions of varying difficulty levels.
Our selection of annotators with education domain knowledge as well as the training and quality control process ensured that the aforementioned annotation protocol was consistently implemented.

We are working on releasing the FairytaleQA, and it will be made available to the public very soon. In the meanwhile, please contact me if you would like to access the dataset.

Twitter Facebook LinkedIn

Ying Xu, Ph.D.