Author: Sun, Shichao
Title: Reading, writing, and refining for text summarization
Advisors: Li, Wenjie Maggie (COMP)
Degree: Ph.D.
Year: 2024
Subject: Text processing (Computer science)
Automatic abstracting
Natural language processing (Computer science)
Machine learning
Hong Kong Polytechnic University -- Dissertations
Department: Department of Computing
Pages: xx, 188 pages : color illustrations
Language: English
Abstract: Text summarization is a critical task in natural language processing, where the lengthy text is condensed into a concise version that highlights the key points from the original source. In today’s fast-paced digital environment, text summarization has become a valuable tool, as it can offer a solution to information overload by facilitating quicker understanding and assimilation of knowledge. Recently, text summarization has re­ceived increasing attention from the academic community.
As pre-trained language models advance, they imitate human text summarization streamline, i.e., reading, writing, and refining, and have achieved impressive perfor­mance. However, these developments also bring new challenges to every stage of the text summarization streamline. Current language models struggle to understand different kinds of documents for various summarization methods, such as news arti­cles in unsupervised extractive summarization and meeting transcripts in supervised abstractive summarization. Consequently, this limitation hinders existing summa­rization methods from producing concise and coherent summaries. Additionally, pre­trained language models are not originally crafted with the specialized purpose of text summarization. Supervised fine-tuning with summarization data suffers from exposure bias and cannot elicit extensive capabilities of pre-trained language mod­els. As a result, there exists untapped potential for existing pre-trained language models to deliver even better performance. Lastly, it is worth noting that like hu­mans, the generated summary on the first try always contains some flaws. It becomes essential to devise novel approaches aimed at refining these initial summaries into superior versions. Yet, this research area still remains unexplored. In this thesis, we summarize the aforementioned challenges as three research problems to address: (1) How to improve the semantic understanding of the input document? (2) How to enhance summary writing of pre-trained language models? (3) How to reliably refine the generated summary for superior quality?
To tackle the challenges previously outlined, we propose a range of approaches to enhance every stage of the text summarization streamline. The body of research presented in this thesis is structured into three parts.
In the first part (work 1 and 2), we explore the problem 1 on unsupervised summariza­tion and supervised summarization. Recent literatures on unsupervised summariza­tion for the news article mostly needs to understand each sentence and estimate the sentence similarity between two sentences. However, sentence similarity estimation using pre-trained language models performs worse in unsupervised summarization than that using statistic vector representation like tfidf. In work 1, we propose to use mutual learning along with an extra signal amplifier, where different granularities of pivotal information for text summarization are mutually learned to enhance the semantic understanding. On the other hand, lengthy and complex documents, like meeting transcripts, impose severe challenges on supervised summarization. In work 2, we focus on meeting summarization, which involves condensing key information from meeting transcripts into concise textual summaries. This task is particularly challenging due to the intricate nature of multi-party interactions and the potential presence of hundreds of participants’ utterances within a single meeting. We ex­plore and confirm the viability of employing dialogue acts to improve understanding of meeting transcripts, where dialogue acts represent the functions that utterances serve within the context of dialogue-based interaction. By integrating dialogue acts, we enhance the classical extract-abstract framework for meeting summarization.
In the second part (work 3 and 4), we investigate the problem 2 with the solution of contrastive learning. Supervised fine-tuning using summarization data is de facto standard practice for text summarization. However, it suffers from exposure bias—a discrepancy between model training and inference. In work 3, we alleviate this issue by introducing a novel contrastive learning loss to integrate the extra negative sample into the fine-tuning process, improving the model’s resilience to exposure bias. Fur­thermore, we observe that while a limited amount of data can yield highly promising results, excessive data can lead to performance decline, potentially due to overfitting. In work 4, we explore methods for data selection and data order. The goal is to determine how to maintain the efficiency and stability observed in work 3.
In the third part (work 5 and 6), we solve the problem 3. Recently, a new paradigm, generating-critiquing-refining, has been introduced into text generation, such as text summarization. Specifically, we can prompt large language models to draft a sum­mary, generate critiques of this draft, and use the critiques to polish the draft further. In work 5, we study which of two prompt strategies, prompt chaining and stepwise prompt, is the better way to perform the generating-critiquing-refining workflow for the refinement in text summarization. In work 6, motivated by the intuition that su­perior critique can lead to better refinement, we first propose to devise the evaluation method for the critique. We validate the hypothesis that superior critique directly contributes to the production of better summary. Beyond text summarization, we also conduct extensive experiments on other text generation tasks, including entail­ment, reasoning, and question answering, to demonstrate that the critique can play a constructive role in polishing the outcome of extensive text generation tasks.
To sum up, we conduct a comprehensive study on text summarization powered by pre-trained language models, ranging from understanding the input text, tuning the models, to refining the generated summary. Through the use of our proposed methods on real-world datasets, we have illustrated the significant improvements.
Rights: All rights reserved
Access: open access

Files in This Item:
File Description SizeFormat 
7644.pdfFor All Users4.98 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/13192