Author: | Woo, Chi-wang |
Title: | Text dependent speaker verification with Hidden Markov Model |
Degree: | M.Sc. |
Year: | 2001 |
Subject: | Speech perception Hong Kong Polytechnic University -- Dissertations |
Department: | Multi-disciplinary Studies Department of Electronic Engineering |
Pages: | iii, 71 leaves : ill. ; 30 cm |
Language: | English |
Abstract: | This dissertation explores the technique of using Mel Frequency Cepstral Coefficients (MFCCs) in text-dependent speaker verification with Continuous Density Hidden Markov Models (CDHMMs). MFCCs were obtained from combination lock phrases and were used as feature vectors to train and test the performance of CDHMMs. As most of the previous works in this area use small corpora (sometimes not suitable for speaker verification) to evaluate the verification performance, the scalability of the previous techniques remains unclear. In this work, the YOHO corpus, which is a public domain speaker verification corpus, was used to study the performance of the proposed techniques. In this work, the performance of CDHMMs on speaker verification with different numbers of states and mixtures was studied. In the experiments, performance of the CDHMMs improves when the numbers of states and mixtures increases. Up to a certain limit, however, the CDHMMs become unstable and produce infinity outputs for certain feature vectors. Therefore, a general search for a suitable combination of numbers of states and mixtures from randomly selected samples phrases was performed before carrying out further experiments. In speaker verification, decision thresholds must be determined. This threshold can be determined during the training of CDHMMs, which is achieved by using a set of feature vectors from background speakers as inputs. The decision threshold is adjusted until the false acceptance rate (FAR) is acceptable. Two approaches to achieve text-dependent speaker verification were investigated in this work. In the first approach, a single CDHMM was used to model a combination lock phrase. In the second approach, three CDHMMs were used to model a combination lock phrase with each CDHMM representing a double-digits word of the phrase. Both techniques were able to identify the speakers successfully with the second technique produced better results. However, the second approach requires an efficient algorithm to segment the phrase into three double-digits words. |
Rights: | All rights reserved |
Access: | restricted access |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
b15995835.pdf | For All Users (off-campus access for PolyU Staff & Students only) | 1.88 MB | Adobe PDF | View/Open |
Copyright Undertaking
As a bona fide Library user, I declare that:
- I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
- I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
- I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.
Please use this identifier to cite or link to this item:
https://theses.lib.polyu.edu.hk/handle/200/1049