Author: Woo, Chi-wang
Title: Text dependent speaker verification with Hidden Markov Model
Year: 2001
Subject: Speech perception
Hong Kong Polytechnic University -- Dissertations
Department: Multi-disciplinary Studies
Dept. of Electronic Engineering
Pages: iii, 71 leaves : ill. ; 30 cm
Language: English
InnoPac Record: http://library.polyu.edu.hk/record=b1599583
URI: http://theses.lib.polyu.edu.hk/handle/200/1049
Abstract: This dissertation explores the technique of using Mel Frequency Cepstral Coefficients (MFCCs) in text-dependent speaker verification with Continuous Density Hidden Markov Models (CDHMMs). MFCCs were obtained from combination lock phrases and were used as feature vectors to train and test the performance of CDHMMs. As most of the previous works in this area use small corpora (sometimes not suitable for speaker verification) to evaluate the verification performance, the scalability of the previous techniques remains unclear. In this work, the YOHO corpus, which is a public domain speaker verification corpus, was used to study the performance of the proposed techniques. In this work, the performance of CDHMMs on speaker verification with different numbers of states and mixtures was studied. In the experiments, performance of the CDHMMs improves when the numbers of states and mixtures increases. Up to a certain limit, however, the CDHMMs become unstable and produce infinity outputs for certain feature vectors. Therefore, a general search for a suitable combination of numbers of states and mixtures from randomly selected samples phrases was performed before carrying out further experiments. In speaker verification, decision thresholds must be determined. This threshold can be determined during the training of CDHMMs, which is achieved by using a set of feature vectors from background speakers as inputs. The decision threshold is adjusted until the false acceptance rate (FAR) is acceptable. Two approaches to achieve text-dependent speaker verification were investigated in this work. In the first approach, a single CDHMM was used to model a combination lock phrase. In the second approach, three CDHMMs were used to model a combination lock phrase with each CDHMM representing a double-digits word of the phrase. Both techniques were able to identify the speakers successfully with the second technique produced better results. However, the second approach requires an efficient algorithm to segment the phrase into three double-digits words.

Files in this item

Files Size Format
b15995835.pdf 1.920Mb PDF
