Author: Ren, Da
Title: Representation modeling based language GANs : from autoregressive models to non-autoregressive models
Degree: Ph.D.
Year: 2024
Department: Department of Computing
Pages: xiii, 119 pages : color illustrations
Language: English
Abstract: Training autoregressive models based on maximum likelihood estimation (MLE) has become a mainstream method in text generation. However, this method has two inherent limitations. First, the discrepancy between training and inference causes the exposure bias problem. Secondly, these models are based on autoregressive structures which have high latency during inference. They are thus inappropriate in scenarios requiring low latency. Instead, Generative Adversarial Networks (GANs) are free from the exposure bias problem and have the potential to construct non-autoregressive (NAR) models. However, GANs have their own limitations in text generation.
First, how to make use of the signals from discriminators to update generators. In text generation, tokens are always sampled from probability distributions while the sam­pling operation prevents gradients from being passed to generators. Existing methods, which model output probabilities, are either high variance or biased estimators. In­stead, we first transform words into representations, and then train the generator to recover these representations. We denote these methods as representation modeling methods. We adopt dropout sampling and fully normalized LSTM to provide a more effective sampling method and keep healthier gradients. Our proposed model out­performs MLE-based models and existing GAN-based models in various evaluations metrics.
Nevertheless, most of existing language GANs are based on autoregressive structures which have high latency. We thus build GAN-based NAR models to obtain the results more efficiently. We divide text generation tasks from two different categories: incomplete information scenarios and complete information scenarios.
For the incomplete information scenarios, whose target contains more information than the input, the multi-modality problem in MLE-based NAR models will be fur­ther augmented. In this scenario, each input has lots of diverse candidates which will be more easily to be mixed. Language GANs tend to generate ungrammati­cal sentences after adopting NAR structures. The input representations obtained by existing methods are similar between different positions. Besides, Transformer builds word dependencies only based on the attention mechanism, while this process becomes unstable during the training of GANs. We tackle these problems by propos­ing two facilities: 1) Position-Aware Self-Modulation to provide more effective input signals, and 2) Dependency Feed Forward Network to strengthen the feed forward network layer with the capacity of dependency modeling. The experimental results demonstrate that our proposed model can obtain comparable performance as existing mainstream models with much fewer decoding iterations.
For the complete information scenarios, whose input has complete information of the output, the complicated mapping relations will cause greater errors in the learned marginal distributions of MLE-based NAR models and thus exacerbate their multi-modality problem. Even our previously proposed GAN-based NAR model also fails to obtain satisfied performance due to the incapacity of modeling the complicated relations. To tackle this problem, we first revise the discriminator structure to make use of unpaired samples. Then, we integrate the reconstruction procedure to better utilize paired samples. We test the performance of our proposed model in image captioning, and our model achieves a new state-of-the-art for fully NAR models on the MSCOCO dataset with much higher speedup and lower parameter number.
Rights: All rights reserved
Access: open access

Files in This Item:
File Description SizeFormat 
7833.pdfFor All Users5.05 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/13412