Full metadata record
DC FieldValueLanguage
dc.contributorDepartment of Applied Mathematicsen_US
dc.contributor.advisorYiu, Cedric (AMA)en_US
dc.creatorChan, Kin Lok-
dc.identifier.urihttps://theses.lib.polyu.edu.hk/handle/200/13631-
dc.languageEnglishen_US
dc.publisherHong Kong Polytechnic Universityen_US
dc.rightsAll rights reserveden_US
dc.titleOn deep learning methods for speech synthesis applicationsen_US
dcterms.abstractVoice Cloning is a speech processing task that aims to synthesize speech with a specific target's voice. There is a resemblant topic named Voice Conversion in the field. The difference is that, while Voice Conversion techniques process existing audio data, Voice Cloning newly synthesizes speech from text. In this thesis, a popular open-source deep-learning-based Voice Cloning model is introduced. The structure of the neural network layers is studied and supporting literature is reviewed.en_US
dcterms.abstractThe objective of this project is twofold. First, we want to optimize the open-sourced model to boost its performance, especially in low-resources cases in which only a limited amount of data is available. The methods studied in this thesis are to optimize hyperparameters of the speech synthesis process and to finetune the model using a small dataset of target speakers. Improvement in speech quality and voice similarity is observed.en_US
dcterms.abstractAnother objective is to develop potential applications of Voice Cloning techniques. In this project, we investigate and propose an application in educational usage, that we can detect pronunciation errors by comparing speech data from real humans and synthesized speech. Existing methods in field may require either professional language knowledge or numerous examples recorded from real humans. Our proposed method employed a TTS model to generate reference speech so that these are no longer necessary. In addition, applying Voice Cloning techniques could simplify the comparison procedure between teachers' and students' speech data.en_US
dcterms.extentxi, 105 pages : color illustrationsen_US
dcterms.isPartOfPolyU Electronic Thesesen_US
dcterms.issued2023en_US
dcterms.educationalLevelM.Phil.en_US
dcterms.educationalLevelAll Masteren_US
dcterms.accessRightsopen accessen_US

Files in This Item:
File Description SizeFormat 
8078.pdfFor All Users2.33 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/13631