Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor | Department of Electronic and Information Engineering | en_US |
dc.contributor.advisor | Mak, M. W. (EIE) | - |
dc.creator | Zhu, Cuiping | - |
dc.identifier.uri | https://theses.lib.polyu.edu.hk/handle/200/10111 | - |
dc.language | English | en_US |
dc.publisher | Hong Kong Polytechnic University | - |
dc.rights | All rights reserved | en_US |
dc.title | Multi-task deep learning for gender- and language-independent speaker recognition | en_US |
dcterms.abstract | I-vectors are utterance-level representations that comprise the characteristics of both speakers and channels. Given the acoustic vectors of an utterance and a total variability loading matrix of a factor analysis model, the i-vector of the utterance can be obtained by computing the posterior mean of the latent factors of the factor analysis model. A deep neural network (DNN) is composed of multiple layers of nonlinear elements that map primitive low-level features from the bottom layers to abstract high-level features in the upper layer. This dissertation discusses several nonlinear activation functions commonly used in DNNs, including sigmoid, tanh, and rectified linear unit. Multi-task learning is a technique to train classifiers (including DNNs) to learn more than one task. Typically, each task is associated with one objective function but the network share hidden nodes so that one task can assist the learning of another task. This dissertation investigates the use of multi-task learning in speaker recognition. Specifically, a DNN is trained to classify not only speakers but also the genders, using i-vectors as the input. This dissertation also investigate adapting a gender-independent DNN to a gender-dependent one by injecting a one-hot gender vector to a hidden layer of the DNN. Results show that learning the speaker identities and genders simultaneously makes the multi-task DNN outperforms the single-task DNN where only the speaker identities are learned. Results also show that injecting gender information to the middle of the DNN can effectively make the DNN more gender-specific, resulting in higher speaker identification accuracy. | en_US |
dcterms.extent | x, 37 pages : color illustrations | en_US |
dcterms.isPartOf | PolyU Electronic Theses | en_US |
dcterms.issued | 2019 | en_US |
dcterms.educationalLevel | M.Sc. | en_US |
dcterms.educationalLevel | All Master | en_US |
dcterms.LCSH | Hong Kong Polytechnic University -- Dissertations | en_US |
dcterms.LCSH | Automatic speech recognition | en_US |
dcterms.LCSH | Speech processing systems | en_US |
dcterms.LCSH | Machine learning | en_US |
dcterms.LCSH | Neural networks (Computer science) | en_US |
dcterms.accessRights | restricted access | en_US |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
991022270855603411.pdf | For All Users (off-campus access for PolyU Staff & Students only) | 1.38 MB | Adobe PDF | View/Open |
Copyright Undertaking
As a bona fide Library user, I declare that:
- I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
- I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
- I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.
Please use this identifier to cite or link to this item:
https://theses.lib.polyu.edu.hk/handle/200/10111