Full metadata record
DC FieldValueLanguage
dc.contributorMulti-disciplinary Studiesen_US
dc.contributorDepartment of Electronic Engineeringen_US
dc.creatorWoo, Chi-wang-
dc.identifier.urihttps://theses.lib.polyu.edu.hk/handle/200/1049-
dc.languageEnglishen_US
dc.publisherHong Kong Polytechnic University-
dc.rightsAll rights reserveden_US
dc.titleText dependent speaker verification with Hidden Markov Modelen_US
dcterms.abstractThis dissertation explores the technique of using Mel Frequency Cepstral Coefficients (MFCCs) in text-dependent speaker verification with Continuous Density Hidden Markov Models (CDHMMs). MFCCs were obtained from combination lock phrases and were used as feature vectors to train and test the performance of CDHMMs. As most of the previous works in this area use small corpora (sometimes not suitable for speaker verification) to evaluate the verification performance, the scalability of the previous techniques remains unclear. In this work, the YOHO corpus, which is a public domain speaker verification corpus, was used to study the performance of the proposed techniques. In this work, the performance of CDHMMs on speaker verification with different numbers of states and mixtures was studied. In the experiments, performance of the CDHMMs improves when the numbers of states and mixtures increases. Up to a certain limit, however, the CDHMMs become unstable and produce infinity outputs for certain feature vectors. Therefore, a general search for a suitable combination of numbers of states and mixtures from randomly selected samples phrases was performed before carrying out further experiments. In speaker verification, decision thresholds must be determined. This threshold can be determined during the training of CDHMMs, which is achieved by using a set of feature vectors from background speakers as inputs. The decision threshold is adjusted until the false acceptance rate (FAR) is acceptable. Two approaches to achieve text-dependent speaker verification were investigated in this work. In the first approach, a single CDHMM was used to model a combination lock phrase. In the second approach, three CDHMMs were used to model a combination lock phrase with each CDHMM representing a double-digits word of the phrase. Both techniques were able to identify the speakers successfully with the second technique produced better results. However, the second approach requires an efficient algorithm to segment the phrase into three double-digits words.en_US
dcterms.extentiii, 71 leaves : ill. ; 30 cmen_US
dcterms.isPartOfPolyU Electronic Thesesen_US
dcterms.issued2001en_US
dcterms.educationalLevelAll Masteren_US
dcterms.educationalLevelM.Sc.en_US
dcterms.LCSHSpeech perceptionen_US
dcterms.LCSHHong Kong Polytechnic University -- Dissertationsen_US
dcterms.accessRightsrestricted accessen_US

Files in This Item:
File Description SizeFormat 
b15995835.pdfFor All Users (off-campus access for PolyU Staff & Students only)1.88 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/1049