Text dependent speaker verification with Hidden Markov Model

Woo, Chi-wang

Full metadata record

DC Field	Value	Language
dc.contributor	Multi-disciplinary Studies	en_US
dc.contributor	Department of Electronic Engineering	en_US
dc.creator	Woo, Chi-wang	-
dc.identifier.uri	https://theses.lib.polyu.edu.hk/handle/200/1049	-
dc.language	English	en_US
dc.publisher	Hong Kong Polytechnic University	-
dc.rights	All rights reserved	en_US
dc.title	Text dependent speaker verification with Hidden Markov Model	en_US
dcterms.abstract	This dissertation explores the technique of using Mel Frequency Cepstral Coefficients (MFCCs) in text-dependent speaker verification with Continuous Density Hidden Markov Models (CDHMMs). MFCCs were obtained from combination lock phrases and were used as feature vectors to train and test the performance of CDHMMs. As most of the previous works in this area use small corpora (sometimes not suitable for speaker verification) to evaluate the verification performance, the scalability of the previous techniques remains unclear. In this work, the YOHO corpus, which is a public domain speaker verification corpus, was used to study the performance of the proposed techniques. In this work, the performance of CDHMMs on speaker verification with different numbers of states and mixtures was studied. In the experiments, performance of the CDHMMs improves when the numbers of states and mixtures increases. Up to a certain limit, however, the CDHMMs become unstable and produce infinity outputs for certain feature vectors. Therefore, a general search for a suitable combination of numbers of states and mixtures from randomly selected samples phrases was performed before carrying out further experiments. In speaker verification, decision thresholds must be determined. This threshold can be determined during the training of CDHMMs, which is achieved by using a set of feature vectors from background speakers as inputs. The decision threshold is adjusted until the false acceptance rate (FAR) is acceptable. Two approaches to achieve text-dependent speaker verification were investigated in this work. In the first approach, a single CDHMM was used to model a combination lock phrase. In the second approach, three CDHMMs were used to model a combination lock phrase with each CDHMM representing a double-digits word of the phrase. Both techniques were able to identify the speakers successfully with the second technique produced better results. However, the second approach requires an efficient algorithm to segment the phrase into three double-digits words.	en_US
dcterms.extent	iii, 71 leaves : ill. ; 30 cm	en_US
dcterms.isPartOf	PolyU Electronic Theses	en_US
dcterms.issued	2001	en_US
dcterms.educationalLevel	All Master	en_US
dcterms.educationalLevel	M.Sc.	en_US
dcterms.LCSH	Speech perception	en_US
dcterms.LCSH	Hong Kong Polytechnic University -- Dissertations	en_US
dcterms.accessRights	restricted access	en_US

Files in This Item:

File	Description	Size	Format
b15995835.pdf	For All Users (off-campus access for PolyU Staff & Students only)	1.88 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/1049