Author: Woo, Chi-wang
Title: Text dependent speaker verification with Hidden Markov Model
Degree: M.Sc.
Year: 2001
Subject: Speech perception
Hong Kong Polytechnic University -- Dissertations
Department: Multi-disciplinary Studies
Department of Electronic Engineering
Pages: iii, 71 leaves : ill. ; 30 cm
Language: English
Abstract: This dissertation explores the technique of using Mel Frequency Cepstral Coefficients (MFCCs) in text-dependent speaker verification with Continuous Density Hidden Markov Models (CDHMMs). MFCCs were obtained from combination lock phrases and were used as feature vectors to train and test the performance of CDHMMs. As most of the previous works in this area use small corpora (sometimes not suitable for speaker verification) to evaluate the verification performance, the scalability of the previous techniques remains unclear. In this work, the YOHO corpus, which is a public domain speaker verification corpus, was used to study the performance of the proposed techniques. In this work, the performance of CDHMMs on speaker verification with different numbers of states and mixtures was studied. In the experiments, performance of the CDHMMs improves when the numbers of states and mixtures increases. Up to a certain limit, however, the CDHMMs become unstable and produce infinity outputs for certain feature vectors. Therefore, a general search for a suitable combination of numbers of states and mixtures from randomly selected samples phrases was performed before carrying out further experiments. In speaker verification, decision thresholds must be determined. This threshold can be determined during the training of CDHMMs, which is achieved by using a set of feature vectors from background speakers as inputs. The decision threshold is adjusted until the false acceptance rate (FAR) is acceptable. Two approaches to achieve text-dependent speaker verification were investigated in this work. In the first approach, a single CDHMM was used to model a combination lock phrase. In the second approach, three CDHMMs were used to model a combination lock phrase with each CDHMM representing a double-digits word of the phrase. Both techniques were able to identify the speakers successfully with the second technique produced better results. However, the second approach requires an efficient algorithm to segment the phrase into three double-digits words.
Rights: All rights reserved
Access: restricted access

Files in This Item:
File Description SizeFormat 
b15995835.pdfFor All Users (off-campus access for PolyU Staff & Students only)1.88 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/1049