Annotation
This thesis suggests a novel rule-based method for automatically generating
questions. The suggested method focuses on analyzing a sentence's
syntactic and
semantic structure. Additionally, a thorough explanation of the suggested approach's
design and execution is provided. Although question generation from sentences is
the designed system's primary goal, automatic evaluation results show that it also
performs admirably on reading comprehension datasets that place more emphasis on
question generation from paragraphs. The designed system significantly
outperforms all other systems when evaluated by humans
and produces the most
natural (human-like) questions. If high-quality questions can be successfully
generated, its possible application could be:
• Help to automatically generate simple questions for reading comprehension test.
• Help generate more data for QA datasets.
• Help to train the QA model in a semi-supervised manner.
KEY WORDS:
Natural Language Processing (NLP), Natural Language
Understanding (NLU), Natural Language Generation (NLG), Automating-
question, Question Answering (QA).
Introduction
Artificial intelligence (AI) has a subfield called Natural Language Processing
(NLP). Although
there are some differences, the research in this area focuses on
natural language, which is the language that people use on a daily basis. As a result,
it is closely tied to linguistics research. NLP is not a broad study of natural language;
rather, it is the creation of computer systems, particularly software systems, that can
successfully communicate in natural language.
As soon as natural language
communication between humans and computers is realized,
the computer will be
able to convey specific thoughts and intentions as well as understand the meaning
of natural language texts. Natural Language Understanding (NLU) refers to the first,
and Natural Language Generation to the second (NLG).
An essential component of the Natural Language Processing (NLP) or, more
specifically, the Natural Language Understanding (NLU) discipline is the question-
answering (QA) task. We presume a computer has a certain level of knowledge if it
can respond to inquiries about a certain corpus after "reading" it by simulating the
reading comprehension exam. The rapid creation of models with good performance
on various well-known QA datasets over the past few years has been seen. Some of
these models even outperform human performance. In this degree project, we would
like to reverse the process and produce questions
given the answers and
accompanying material, as opposed to further creating the model for the QA work.
The questions should,
to some extent, represent the understanding of the
corpus since the design of the QA task seeks to test the machine's
capacity for
reading comprehension. Since the question cannot be simply extracted from the text,
this project additionally incorporates Natural Language Generation (NLG), in
addition to the NLU component.