Author: | Yuan, Ruifeng |
Title: | Exploring text summarization beyond news articles |
Advisors: | Li, Wenjie Maggie (COMP) |
Degree: | Ph.D. |
Year: | 2024 |
Subject: | Automatic abstracting Electronic information resources -- Abstracting and indexing Natural language processing (Computer science) Hong Kong Polytechnic University -- Dissertations |
Department: | Department of Computing |
Pages: | xx, 172 pages : color illustrations |
Language: | English |
Abstract: | Text summarization has been an important task for natural language processing. It aims to compress the source document(s) into a more concise version that covers the most important information. In recent years, with the development of the neural-based language model, text summarization has made great progress. In this process, news summarization is undoubtedly the most important research topic in this field. On the one hand, news summaries have inherent application scenarios in real life. On the other hand, a set of large-scale news summarization datasets has been proposed to meet the data requirement of neural models. Therefore, for a considerable period, researching general summarization models on news data has become the mainstream paradigm in text summarization. With the continuous advancement of text summarization models and techniques, researchers are no longer confined to such a paradigm but are further exploring or rediscovering more diverse text summarization problems. These problems often have their unique characteristics, which means the general approaches cannot be blindly applied. Meanwhile, they still share similarities with it in many ways. In this thesis, we aim to investigate text summarization problems beyond the ”news data + general model” mainstream paradigm. More specifically, we identify three research problems to be addressed: 1. How to use the natural features of news articles to improve the general summarization model on news summarization? 2. How to extend the current general summarization models to summarization tasks/domains where these models cannot be directly applied? 3. How to utilize the large-scale data in news summarization to assist summarization tasks/domains with insufficient data? To address the aforementioned problems, we aim to develop summarization approaches for specific domains or tasks. Based on the three proposed research questions, the thesis is naturally divided into three parts. In the first part (work 1 and work 2), to enhance the news summarization with its unique characteristics, we propose to incorporate a typical kind of extra information into the summarization model, the event-level information. The research target here is to investigate what role event-level information plays in both extractive and abstractive news summarization, and how to make good use of them. In work 1, we propose to extract event-level semantic units for better extractive news summarization. We also introduce a hierarchical structure, which incorporates the multi-level of granularities of the textual information into the model. In work 2, we explore the effective sentence fusion approach that can fuse extracted salient information to abstractive summary sentences. We propose to build an event graph from the input sentences to effectively capture and organize related events in a structured way and use the constructed event graph to guide the summarization. In addition to making use of the attention over the content of sentences and graph nodes, we further develop a graph flow attention mechanism to control the fusion process via the graph structure for the faithfulness of the fused summaries. The experiments and further ablation studies on news datasets demonstrate the effectiveness of event-level information in news summarization. The second part (work 3) aims to explore text summarization problems that can not directly apply general summarization models, where the most representative one is long-input summarization. As general summarization models struggle with long-length input because of their high memory cost, these models cannot directly apply to documents with thousands of tokens. The main challenge is how to effectively extend mature summarization techniques and efficiently handle the difficulties brought by the long input. In work 3, we present a context-aware extract-generate framework (CAEG) for long-input text summarization. It focuses on preserving both local and global context information in an extract-generate framework with little cost. CAEG generates a set of context-related text spans called context prompts for each text snippet and uses them to transfer the context information from the extractor and generator. To find such context prompts, we propose to capture the context information based on the interpretation of the extractor, where the text spans having the highest contribution to the extraction decision is considered as containing the richest context information. The experiments show the effectiveness and efficiency of our model in capturing and preserving the context information in the long-input summarization. The third part (work 4 and work 5) delves into problem 3 and investigates how to effectively transfer the knowledge of summarization learned from news data to tasks or domains with insufficient data. Work 4 explores this problem from the perspective of task knowledge transferring in the context of query-focused summarization. In this work, we investigate the idea of whether we can integrate and transfer the knowledge of news summarization and question answering to assist the few-shot learning in query-focused summarization. Here, we propose prefix-merging, a prefix-based pre-training strategy for few-shot learning in query-focused summarization. We integrate the task knowledge from text summarization and question answering into a properly designed prefix and apply the merged prefix to query-focused summarization. In addition to task knowledge transfer, we also investigate domain transfer of extractive summarization in work 5. In text summarization, context information is considered as a key factor. Meanwhile, there also exist other pattern factors that can identify sentence importance, such as sentence position or certain n-gram tokens. In this work, we attempt to apply disentangled representation learning on extractive summarization, and separate the two key factors for the task, context and pattern, for a better generalization ability in the low-resource setting. The experiments suggest the great potential of the knowledge contained in the large-scale news summarization data in improving the summarization system in other tasks or domains. As a conclusion, we study our proposed research problems of text summarization in a systematic way. We illustrate the importance of these problems and demonstrate the effectiveness of our approaches on various datasets. This shows the potential of our works to benefit real-world applications, such as news summarization, academic research and medical records collection. |
Rights: | All rights reserved |
Access: | open access |
Copyright Undertaking
As a bona fide Library user, I declare that:
- I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
- I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
- I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.
Please use this identifier to cite or link to this item:
https://theses.lib.polyu.edu.hk/handle/200/13041