Improving Unsupervised Question Answering via Summarization-Informed Question Generation

Graham, Yvette

dc.contributor.author	Graham, Yvette	en
dc.date.accessioned	2022-03-08T13:02:08Z
dc.date.available	2022-03-08T13:02:08Z
dc.date.issued	2021	en
dc.date.submitted	2021	en
dc.identifier.citation	Lyu, Chenyang and Shang, Lifeng and Graham, Yvette and Foster, Jennifer and Jiang, Xin and Liu, Qun, Improving Unsupervised Question Answering via Summarization-Informed Question Generation, 2021 Conference on Empirical Methods in Natural Language Processing, Online and Punta Cana, Dominican Republic, Association for Computational Linguistics, 2021, 4134 - 4148	en
dc.identifier.other	Y	en
dc.identifier.uri	http://hdl.handle.net/2262/98265
dc.description	PUBLISHED	en
dc.description	Online and Punta Cana, Dominican Republic	en
dc.description.abstract	Question Generation (QG) is the task of generating a plausible question for a given <passage, answer> pair. Template-based QG uses linguistically-informed heuristics to transform declarative sentences into interrogatives, whereas supervised QG uses existing Question Answering (QA) datasets to train a system to generate a question given a passage and an answer. A disadvantage of the heuristic approach is that the generated questions are heavily tied to their declarative counterparts. A disadvantage of the supervised approach is that they are heavily tied to the domain/language of the QA dataset used as training data. In order to overcome these shortcomings, we propose an unsupervised QG method which uses questions generated heuristically from summaries as a source of training data for a QG system. We make use of freely available news summary data, transforming declarative summary sentences into appropriate questions using heuristics informed by dependency parsing, named entity recognition and semantic role labeling. The resulting questions are then combined with the original news articles to train an end-to-end neural QG model. We extrinsically evaluate our approach using unsupervised QA: our QG model is used to generate synthetic QA pairs for training a QA model. Experimental results show that, trained with only 20k English Wikipedia-based synthetic QA pairs, the QA model substantially outperforms previous unsupervised models on three in-domain datasets (SQuAD1.1, Natural Questions, TriviaQA) and three out-of-domain datasets (NewsQA, BioASQ, DuoRC), demonstrating the transferability of the approach.	en
dc.format.extent	4134	en
dc.format.extent	4148	en
dc.language.iso	en	en
dc.publisher	Association for Computational Linguistics	en
dc.rights	Y	en
dc.title	Improving Unsupervised Question Answering via Summarization-Informed Question Generation	en
dc.title.alternative	2021 Conference on Empirical Methods in Natural Language Processing	en
dc.type	Conference Paper	en
dc.type.supercollection	scholarly_publications	en
dc.type.supercollection	refereed_publications	en
dc.identifier.peoplefinderurl	http://people.tcd.ie/ygraham	en
dc.identifier.rssinternalid	239092	en
dc.rights.ecaccessrights	openAccess
dc.subject.TCDTheme	Digital Engagement	en
dc.subject.TCDTag	Information Technology	en
dc.subject.TCDTag	Natural Language Processing	en
dc.identifier.rssuri	https://aclanthology.org/2021.emnlp-main.340.pdf	en
dc.identifier.orcid_id	0000-0001-6741-4855	en
dc.subject.darat_thematic	Education	en
dc.status.accessible	N	en

Files in this item

Name:: 2021.emnlp-main.340.pdf
Size:: 586.7Kb
Format:: PDF

View/Open

Name:: license.txt
Size:: 3.424Kb
Format:: Text file

View/Open

This item appears in the following Collection(s)

Computer Science (Scholarly Publications)
Computer Science (Scholarly Publications)
RSS Feeds

Show simple item record

Browse

My Account

Improving Unsupervised Question Answering via Summarization-Informed Question Generation

Files in this item

This item appears in the following Collection(s)