Show simple item record

dc.contributor.authorGraham, Yvetteen
dc.date.accessioned2022-03-08T13:02:08Z
dc.date.available2022-03-08T13:02:08Z
dc.date.issued2021en
dc.date.submitted2021en
dc.identifier.citationLyu, Chenyang and Shang, Lifeng and Graham, Yvette and Foster, Jennifer and Jiang, Xin and Liu, Qun, Improving Unsupervised Question Answering via Summarization-Informed Question Generation, 2021 Conference on Empirical Methods in Natural Language Processing, Online and Punta Cana, Dominican Republic, Association for Computational Linguistics, 2021, 4134 - 4148en
dc.identifier.otherYen
dc.identifier.urihttp://hdl.handle.net/2262/98265
dc.descriptionPUBLISHEDen
dc.descriptionOnline and Punta Cana, Dominican Republicen
dc.description.abstractQuestion Generation (QG) is the task of generating a plausible question for a given <passage, answer> pair. Template-based QG uses linguistically-informed heuristics to transform declarative sentences into interrogatives, whereas supervised QG uses existing Question Answering (QA) datasets to train a system to generate a question given a passage and an answer. A disadvantage of the heuristic approach is that the generated questions are heavily tied to their declarative counterparts. A disadvantage of the supervised approach is that they are heavily tied to the domain/language of the QA dataset used as training data. In order to overcome these shortcomings, we propose an unsupervised QG method which uses questions generated heuristically from summaries as a source of training data for a QG system. We make use of freely available news summary data, transforming declarative summary sentences into appropriate questions using heuristics informed by dependency parsing, named entity recognition and semantic role labeling. The resulting questions are then combined with the original news articles to train an end-to-end neural QG model. We extrinsically evaluate our approach using unsupervised QA: our QG model is used to generate synthetic QA pairs for training a QA model. Experimental results show that, trained with only 20k English Wikipedia-based synthetic QA pairs, the QA model substantially outperforms previous unsupervised models on three in-domain datasets (SQuAD1.1, Natural Questions, TriviaQA) and three out-of-domain datasets (NewsQA, BioASQ, DuoRC), demonstrating the transferability of the approach.en
dc.format.extent4134en
dc.format.extent4148en
dc.language.isoenen
dc.publisherAssociation for Computational Linguisticsen
dc.rightsYen
dc.titleImproving Unsupervised Question Answering via Summarization-Informed Question Generationen
dc.title.alternative2021 Conference on Empirical Methods in Natural Language Processingen
dc.typeConference Paperen
dc.type.supercollectionscholarly_publicationsen
dc.type.supercollectionrefereed_publicationsen
dc.identifier.peoplefinderurlhttp://people.tcd.ie/ygrahamen
dc.identifier.rssinternalid239092en
dc.rights.ecaccessrightsopenAccess
dc.subject.TCDThemeDigital Engagementen
dc.subject.TCDTagInformation Technologyen
dc.subject.TCDTagNatural Language Processingen
dc.identifier.rssurihttps://aclanthology.org/2021.emnlp-main.340.pdfen
dc.identifier.orcid_id0000-0001-6741-4855en
dc.subject.darat_thematicEducationen
dc.status.accessibleNen


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record