Multi-task deep learning for gender- and language-independent speaker recognition

Zhu, Cuiping

Full metadata record

DC Field	Value	Language
dc.contributor	Department of Electronic and Information Engineering	en_US
dc.contributor.advisor	Mak, M. W. (EIE)	-
dc.creator	Zhu, Cuiping	-
dc.identifier.uri	https://theses.lib.polyu.edu.hk/handle/200/10111	-
dc.language	English	en_US
dc.publisher	Hong Kong Polytechnic University	-
dc.rights	All rights reserved	en_US
dc.title	Multi-task deep learning for gender- and language-independent speaker recognition	en_US
dcterms.abstract	I-vectors are utterance-level representations that comprise the characteristics of both speakers and channels. Given the acoustic vectors of an utterance and a total variability loading matrix of a factor analysis model, the i-vector of the utterance can be obtained by computing the posterior mean of the latent factors of the factor analysis model. A deep neural network (DNN) is composed of multiple layers of nonlinear elements that map primitive low-level features from the bottom layers to abstract high-level features in the upper layer. This dissertation discusses several nonlinear activation functions commonly used in DNNs, including sigmoid, tanh, and rectified linear unit. Multi-task learning is a technique to train classifiers (including DNNs) to learn more than one task. Typically, each task is associated with one objective function but the network share hidden nodes so that one task can assist the learning of another task. This dissertation investigates the use of multi-task learning in speaker recognition. Specifically, a DNN is trained to classify not only speakers but also the genders, using i-vectors as the input. This dissertation also investigate adapting a gender-independent DNN to a gender-dependent one by injecting a one-hot gender vector to a hidden layer of the DNN. Results show that learning the speaker identities and genders simultaneously makes the multi-task DNN outperforms the single-task DNN where only the speaker identities are learned. Results also show that injecting gender information to the middle of the DNN can effectively make the DNN more gender-specific, resulting in higher speaker identification accuracy.	en_US
dcterms.extent	x, 37 pages : color illustrations	en_US
dcterms.isPartOf	PolyU Electronic Theses	en_US
dcterms.issued	2019	en_US
dcterms.educationalLevel	M.Sc.	en_US
dcterms.educationalLevel	All Master	en_US
dcterms.LCSH	Hong Kong Polytechnic University -- Dissertations	en_US
dcterms.LCSH	Automatic speech recognition	en_US
dcterms.LCSH	Speech processing systems	en_US
dcterms.LCSH	Machine learning	en_US
dcterms.LCSH	Neural networks (Computer science)	en_US
dcterms.accessRights	restricted access	en_US

Files in This Item:

File	Description	Size	Format
991022270855603411.pdf	For All Users (off-campus access for PolyU Staff & Students only)	1.38 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/10111