Full metadata record
DC FieldValueLanguage
dc.contributorDepartment of Computingen_US
dc.contributor.advisorLi, Wenjie (COMP)en_US
dc.creatorMo, Feiyu-
dc.identifier.urihttps://theses.lib.polyu.edu.hk/handle/200/11336-
dc.languageEnglishen_US
dc.publisherHong Kong Polytechnic Universityen_US
dc.rightsAll rights reserveden_US
dc.titleImproving transformer on generation tasks with sequence informationen_US
dcterms.abstractIn recent years, the transformer model [Vaswani et al., 2017] has been the winner of many text generation tasks [Radford et al., 2018; Devlin et al., 2018]. It uses a position matrix to represent the input sequence information, and bases solely on attention mechanisms [Vaswani et al., 2017] in the encoder and the decoder. But a lot of work shows that the position matrix is the weakness of the transformer [Dehghani et al., 2019; Li et al., 2016; Lin et al., 2017; Shaw et al., 2018; Shi et al., 2018; Yang et al., 2018]. So my work is to continue to solve the problem of the lack of sequence information in the transformer. This thesis has carried on the ponder and the experiment in three directions, including replacement, enhancement and assistance. The first direction is to replace the position matrix in the Transformer with an RNN module [Hochreiter et al., 1997; Mikolov et al., 2010]. The second direction is to improve the model by making more efficient use of position matrix adding position matrix to each self-attention layer or pre-training position matrix, to get more sequence information. The last one is to guide the model training process with an imbalanced loss function, which has the sequence information of output. The metrics used in the experiment is BLEU [Papineni et al., 2002. The number of parameters and efficiency of the model will also be analyzed. The datasets this thesis used are IWSLT16 German (Deutsch) ⇒ English dataset and WMT14 English ⇒ German (Deutsch) dataset. Experiments in the first and second directions were only conducted on IWSLT16 German (Deutsch) ⇒ English dataset. Although these experiments did not go to the breakthrough in indicators, but also hope to bring some thinking to future research. The third research direction has carried on the experiment in these two datasets and some improvements have been achieved. In IWSLT16 German (Deutsch) ⇒ English dataset, compared with the baseline method, imbalance loss increased by 1.09 bleu, that is to say, increased by 5.13%. In WMT14 English ⇒ German (Deutsch) dataset, compared with the baseline increased by 0.22 Bleu, that is to say, increased by 0.94%. At the end of the thesis, future work is also described, including the pre­training position matrix mentioned above. Because there is no time for experimental verification in this thesis.en_US
dcterms.extentii, viii, 71 pages : color illustrationsen_US
dcterms.isPartOfPolyU Electronic Thesesen_US
dcterms.issued2020en_US
dcterms.educationalLevelM.Sc.en_US
dcterms.educationalLevelAll Masteren_US
dcterms.LCSHText processing (Computer science)en_US
dcterms.LCSHNatural language processing (Computer science)en_US
dcterms.LCSHHong Kong Polytechnic University -- Dissertationsen_US
dcterms.accessRightsrestricted accessen_US

Files in This Item:
File Description SizeFormat 
5808.pdfFor All Users (off-campus access for PolyU Staff & Students only)10.6 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/11336