Author: | Yao, Qi |
Title: | SNR-invariant deep neural networks using multi-task learning for robust I-vector speaker verification |
Advisors: | Mak, M. W. (EIE) |
Degree: | M.Sc. |
Year: | 2018 |
Subject: | Hong Kong Polytechnic University -- Dissertations Automatic speech recognition Speech processing systems |
Department: | Department of Electronic and Information Engineering |
Pages: | xvi, 92 pages : color illustrations |
Language: | English |
Abstract: | Text-independent speaker verification (SV) is a binary classification task that aims to verify the identity of speakers through analyzing and classifying their voices. The i-vector feature representation together with the probabilistic linear discriminant analysis (PLDA) backend have achieved state-of-the-art performance. However, it is still challenging when the i-vector/PLDA framework is applied to real-world noisy environments. This is because i-vectors represent all kinds of variabilities in the total variability space. This dissertation shows that i-vectors form clusters according to the SNR level of utterances. In light of this SNR-dependent clustering phenomenon, we propose three deep neural networks (DNN) to compensate for the channel-and SNR-variabilities directly in the i-vector space. These three DNNs are named as Regression DNN (RDNN), Hierarchical Regression DNNs (H-RDNNs) and Multi-Task DNN (MT-DNN), respectively. The RDNN takes noisy i-vectors as input and maps them to speaker-dependent cluster means. The H-RDNNs are formed by stacking a second regression DNN on top of the RDNN. The second stage of the H-RDNN aims to regularize the outliers that cannot be denoised properly by the RDNN. The MT-DNN makes use of an extra speaker classification task as an auxiliary task to retain speaker information in the denoised i-vectors. The secondary task of the MT-DNN is trained with a primary (regression) task using an alternating-backpropagation algorithm. We found that among all DNN-based denoising models, the MT-DNN achieves the best performance for denoising the noisy i-vectors. Experiments based on NIST 2012 SRE suggest that DNN-based approaches together with the PLDA backend significantly outperforms the multi-condition PLDA model and mixture of PLDA models. Furthermore, the MT-DNN achieves considerable improvements with 23% reduction in EER and 9% reduction in minDCF on average in Common Condition (CC) 4 and 5, even in an SNR mismatch condition. |
Rights: | All rights reserved |
Access: | restricted access |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
991022144624903411.pdf | For All Users (off-campus access for PolyU Staff & Students only) | 6.34 MB | Adobe PDF | View/Open |
Copyright Undertaking
As a bona fide Library user, I declare that:
- I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
- I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
- I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.
Please use this identifier to cite or link to this item:
https://theses.lib.polyu.edu.hk/handle/200/9571