|A fast scoring method for PLDA with uncertainty propagation
|Mak, M. W. (EIE)
|Hong Kong Polytechnic University -- Dissertations
Automatic speech recognition
|Department of Electronic and Information Engineering
|vii, 57 pages : color illustrations
|Speaker verification refers to the task of determining whether or not a claimant is the person he/she claims to be. In text-independent speaker verification, using i-vectors as low-dimensional feature representation and probabilistic linear discriminant analysis (PLDA) for session compensation and classification has achieved the state-of-the-art performance in many scenarios. However, the good performance of standard i-vector/PLDA framework relies on the condition that both the enrolment utterances and test utterances are sufficiently long for reliable estimation of i-vectors. In real applications, both enrolment and test utterances could be very short, resulting in erroneous i-vector estimation. Recently, an innovative approach to addressing the short-utterance problem in i-vector/PLDA framework has been proposed. By propagating the covariance of i-vectors into the PLDA model, this approach explicitly expresses uncertainty of i-vector extraction in the verification stage. The method is called Uncertainty Propagation (UP). It has showed superior performance over standard PLDA/i-vector framework in short-utterance scenarios. However, the method leads to session-dependent loading matrices in the PLDA model, which makes the verification process computationally expensive. Beside, the method also requires a large amount of memory for storing the covariance matrices of target speaker's i-vectors. A method to alleviate the computational burden and memory requirement of Uncertainty Propagation is imperative. This thesis proposes a method to speed up the verification process and to relax memory requirement in UP by building up a repository to store the length-dependent matrices. During verification, the proper length-dependent matrices are selected for scoring. Experiments on the NIST 2012 Speaker Recognition Evaluation show that the proposed method performs as good as the standard UP with only 3.7% of the scoring time and 37% of memory consumption that standard UP would take. Beside, with minor compromise on the performance (an increase of 0.35% in EER), the method can further reduce memory consumption to only 15% of standard UP.
|All rights reserved
Files in This Item:
|For All Users (off-campus access for PolyU Staff & Students only)
As a bona fide Library user, I declare that:
- I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
- I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
- I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.
Please use this identifier to cite or link to this item: