Multi-task deep learning for gender- and language-independent speaker recognition

Zhu, Cuiping

Author:	Zhu, Cuiping
Title:	Multi-task deep learning for gender- and language-independent speaker recognition
Advisors:	Mak, M. W. (EIE)
Degree:	M.Sc.
Year:	2019
Subject:	Hong Kong Polytechnic University -- Dissertations Automatic speech recognition Speech processing systems Machine learning Neural networks (Computer science)
Department:	Department of Electronic and Information Engineering
Pages:	x, 37 pages : color illustrations
Language:	English
Abstract:	I-vectors are utterance-level representations that comprise the characteristics of both speakers and channels. Given the acoustic vectors of an utterance and a total variability loading matrix of a factor analysis model, the i-vector of the utterance can be obtained by computing the posterior mean of the latent factors of the factor analysis model. A deep neural network (DNN) is composed of multiple layers of nonlinear elements that map primitive low-level features from the bottom layers to abstract high-level features in the upper layer. This dissertation discusses several nonlinear activation functions commonly used in DNNs, including sigmoid, tanh, and rectified linear unit. Multi-task learning is a technique to train classifiers (including DNNs) to learn more than one task. Typically, each task is associated with one objective function but the network share hidden nodes so that one task can assist the learning of another task. This dissertation investigates the use of multi-task learning in speaker recognition. Specifically, a DNN is trained to classify not only speakers but also the genders, using i-vectors as the input. This dissertation also investigate adapting a gender-independent DNN to a gender-dependent one by injecting a one-hot gender vector to a hidden layer of the DNN. Results show that learning the speaker identities and genders simultaneously makes the multi-task DNN outperforms the single-task DNN where only the speaker identities are learned. Results also show that injecting gender information to the middle of the DNN can effectively make the DNN more gender-specific, resulting in higher speaker identification accuracy.
Rights:	All rights reserved
Access:	restricted access

Files in This Item:

File	Description	Size	Format
991022270855603411.pdf	For All Users (off-campus access for PolyU Staff & Students only)	1.38 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/10111