Full metadata record
DC FieldValueLanguage
dc.contributorDepartment of Electronic and Information Engineeringen_US
dc.contributor.advisorMak, M. W. (EIE)-
dc.creatorZhu, Cuiping-
dc.publisherHong Kong Polytechnic University-
dc.rightsAll rights reserveden_US
dc.titleMulti-task deep learning for gender- and language-independent speaker recognitionen_US
dcterms.abstractI-vectors are utterance-level representations that comprise the characteristics of both speakers and channels. Given the acoustic vectors of an utterance and a total variability loading matrix of a factor analysis model, the i-vector of the utterance can be obtained by computing the posterior mean of the latent factors of the factor analysis model. A deep neural network (DNN) is composed of multiple layers of nonlinear elements that map primitive low-level features from the bottom layers to abstract high-level features in the upper layer. This dissertation discusses several nonlinear activation functions commonly used in DNNs, including sigmoid, tanh, and rectified linear unit. Multi-task learning is a technique to train classifiers (including DNNs) to learn more than one task. Typically, each task is associated with one objective function but the network share hidden nodes so that one task can assist the learning of another task. This dissertation investigates the use of multi-task learning in speaker recognition. Specifically, a DNN is trained to classify not only speakers but also the genders, using i-vectors as the input. This dissertation also investigate adapting a gender-independent DNN to a gender-dependent one by injecting a one-hot gender vector to a hidden layer of the DNN. Results show that learning the speaker identities and genders simultaneously makes the multi-task DNN outperforms the single-task DNN where only the speaker identities are learned. Results also show that injecting gender information to the middle of the DNN can effectively make the DNN more gender-specific, resulting in higher speaker identification accuracy.en_US
dcterms.extentx, 37 pages : color illustrationsen_US
dcterms.isPartOfPolyU Electronic Thesesen_US
dcterms.educationalLevelAll Masteren_US
dcterms.LCSHHong Kong Polytechnic University -- Dissertationsen_US
dcterms.LCSHAutomatic speech recognitionen_US
dcterms.LCSHSpeech processing systemsen_US
dcterms.LCSHMachine learningen_US
dcterms.LCSHNeural networks (Computer science)en_US
dcterms.accessRightsrestricted accessen_US

Files in This Item:
File Description SizeFormat 
991022270855603411.pdfFor All Users (off-campus access for PolyU Staff & Students only)1.38 MBAdobe PDFView/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/10111