Author: Mo, Feiyu
Title: Improving transformer on generation tasks with sequence information
Advisors: Li, Wenjie (COMP)
Degree: M.Sc.
Year: 2020
Subject: Text processing (Computer science)
Natural language processing (Computer science)
Hong Kong Polytechnic University -- Dissertations
Department: Department of Computing
Pages: ii, viii, 71 pages : color illustrations
Language: English
Abstract: In recent years, the transformer model [Vaswani et al., 2017] has been the winner of many text generation tasks [Radford et al., 2018; Devlin et al., 2018]. It uses a position matrix to represent the input sequence information, and bases solely on attention mechanisms [Vaswani et al., 2017] in the encoder and the decoder. But a lot of work shows that the position matrix is the weakness of the transformer [Dehghani et al., 2019; Li et al., 2016; Lin et al., 2017; Shaw et al., 2018; Shi et al., 2018; Yang et al., 2018]. So my work is to continue to solve the problem of the lack of sequence information in the transformer. This thesis has carried on the ponder and the experiment in three directions, including replacement, enhancement and assistance. The first direction is to replace the position matrix in the Transformer with an RNN module [Hochreiter et al., 1997; Mikolov et al., 2010]. The second direction is to improve the model by making more efficient use of position matrix adding position matrix to each self-attention layer or pre-training position matrix, to get more sequence information. The last one is to guide the model training process with an imbalanced loss function, which has the sequence information of output. The metrics used in the experiment is BLEU [Papineni et al., 2002. The number of parameters and efficiency of the model will also be analyzed. The datasets this thesis used are IWSLT16 German (Deutsch) ⇒ English dataset and WMT14 English ⇒ German (Deutsch) dataset. Experiments in the first and second directions were only conducted on IWSLT16 German (Deutsch) ⇒ English dataset. Although these experiments did not go to the breakthrough in indicators, but also hope to bring some thinking to future research. The third research direction has carried on the experiment in these two datasets and some improvements have been achieved. In IWSLT16 German (Deutsch) ⇒ English dataset, compared with the baseline method, imbalance loss increased by 1.09 bleu, that is to say, increased by 5.13%. In WMT14 English ⇒ German (Deutsch) dataset, compared with the baseline increased by 0.22 Bleu, that is to say, increased by 0.94%. At the end of the thesis, future work is also described, including the pre­training position matrix mentioned above. Because there is no time for experimental verification in this thesis.
Rights: All rights reserved
Access: restricted access

Files in This Item:
File Description SizeFormat 
5808.pdfFor All Users (off-campus access for PolyU Staff & Students only)10.6 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/11336