Author: | Chan, Kin Lok |
Title: | On deep learning methods for speech synthesis applications |
Advisors: | Yiu, Cedric (AMA) |
Degree: | M.Phil. |
Year: | 2023 |
Department: | Department of Applied Mathematics |
Pages: | xi, 105 pages : color illustrations |
Language: | English |
Abstract: | Voice Cloning is a speech processing task that aims to synthesize speech with a specific target's voice. There is a resemblant topic named Voice Conversion in the field. The difference is that, while Voice Conversion techniques process existing audio data, Voice Cloning newly synthesizes speech from text. In this thesis, a popular open-source deep-learning-based Voice Cloning model is introduced. The structure of the neural network layers is studied and supporting literature is reviewed. The objective of this project is twofold. First, we want to optimize the open-sourced model to boost its performance, especially in low-resources cases in which only a limited amount of data is available. The methods studied in this thesis are to optimize hyperparameters of the speech synthesis process and to finetune the model using a small dataset of target speakers. Improvement in speech quality and voice similarity is observed. Another objective is to develop potential applications of Voice Cloning techniques. In this project, we investigate and propose an application in educational usage, that we can detect pronunciation errors by comparing speech data from real humans and synthesized speech. Existing methods in field may require either professional language knowledge or numerous examples recorded from real humans. Our proposed method employed a TTS model to generate reference speech so that these are no longer necessary. In addition, applying Voice Cloning techniques could simplify the comparison procedure between teachers' and students' speech data. |
Rights: | All rights reserved |
Access: | open access |
Copyright Undertaking
As a bona fide Library user, I declare that:
- I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
- I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
- I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.
Please use this identifier to cite or link to this item:
https://theses.lib.polyu.edu.hk/handle/200/13631