To test the effectiveness of the proposed approach, it is compared against two
state-of-the-art systems: Du et al.’s (2017) learning-based
system for question
generation for reading comprehension and the best rule-based system by Heilman
and Smith (2011). Our automatic evaluation through objective neural translation
metrics show that our system has superior performance and outperforms H&S and
Du’s systems in BLEU-2, METEOR, and ROUGE-L metrics.
The superior
performance of the proposed approach especially in METEOR metric can be
attributed to its recall based nature. By diversifying and extending the rule sets, our
approach produces an expanded set of questions out of those that can possibly be
asked.
We performed human evaluation as well. In human evaluations, the designed
system significantly outperforms H&S and Du’s systems and generated the most
natural (human-like) questions.