A text-independent speaker verification system based on Gaussian mixture speaker models

Wong, Chi-bun

Full metadata record

DC Field	Value	Language
dc.contributor	Multi-disciplinary Studies	en_US
dc.contributor	Department of Electronic Engineering	en_US
dc.creator	Wong, Chi-bun	-
dc.identifier.uri	https://theses.lib.polyu.edu.hk/handle/200/823	-
dc.language	English	en_US
dc.publisher	Hong Kong Polytechnic University	-
dc.rights	All rights reserved	en_US
dc.title	A text-independent speaker verification system based on Gaussian mixture speaker models	en_US
dcterms.abstract	Because of the growing demand and popularity of 24-hour telephone banking services, security of financial transactions on the public telephone network becomes an important issue. Since different speakers have different speech features, the verification of the speakers is believed to be a possible way to enhance the security of these transactions. This project proposes to use Gaussian Mixture Models (GMMs) for speaker verification. Basically, GMM adopts an implicit segmentation approach. It provides a probabilistic model of the underlying sounds of a person voice, but it does not impose any Markovian constraints on the sound classes. Besides, it can be integrated with a statistical background noise model for noise robustness. Furthermore, it models speaker identity based on two interpretations. Firstly, the Gaussian components are used to represent the spectral shapes of phonetic sounds, which characterise a person voice. Secondly, by modeling the underlying acoustic classes, it is better to model the short-term variations of a person voice, allowing a high verification performance for short utterances. In this project, GMMs with diagonal covariance matrices have been built and text-independent speaker verification experiments have been performed. All speech samples were taken from the dialect region 2 of the DARPA TIMIT speech corpus. In order to compare the performance of the Gaussian Mixture models between a noisy and a noise free environment, both TIMIT (noise free speech database) and NTIMIT (noisy telephone bandwidth speech database) were used in the experiments. The objective of the experiments is to evaluate the performance of the GMM models in terms of the error rates of the speakers verification. Basically, it can be divided into three steps. Initially, 20 speakers have been trained by using the SA and SX sentence sets of the TIMIT and NTIMIT corpuses. Next, a threshold is calculated which is used to determine whether to accept or to reject a claimant when his feature vectors are presented to the speaker GMM. Finally, all feature vectors derived from the SI sentence set of 36 impostors are input into the GMMs to calculate the individual false acceptance rate and the overall false acceptance rate. The results show that, as far as TIMIT database is concerned, the verification error rate decreases when the number of Gaussian components increases from 8 to 64. The lowest achievable error rate is 4.13%. As for NTIMIT database, the verification error rate decreases when the number of Gaussian components increases from 8 to 40, However, the error rate increases when the number of Gaussian components is greater than 40. The lowest achievable error rate AErr is 10.96%. From the above results, it can be concluded that GMMs is more suitable for applying at a noise free environment and it can achieve a very low verification error rate. If GMMs is applied at a noisy environment, a better preprocessing technique is essential. Moreover, the verification result shows that Gaussian mixture models are speaker and impostor dependent. It may be owing to the fact that some speakers or impostors have dependent components. To achieve a better recognition result, it is found that the diagonal covariance matrices can be replaced by the full covariance matrices. In addition, the clustering size analysis shows that more than 70% of Gaussian components is not well trained, therefore they can be removed, resulting in less parameters and a faster calculation.	en_US
dcterms.extent	42, [16] leaves : ill. ; 30 cm	en_US
dcterms.isPartOf	PolyU Electronic Theses	en_US
dcterms.issued	1998	en_US
dcterms.educationalLevel	All Master	en_US
dcterms.educationalLevel	M.Sc.	en_US
dcterms.LCSH	Speech processing systems	en_US
dcterms.LCSH	Automatic speech recognition	en_US
dcterms.LCSH	Hong Kong Polytechnic University -- Dissertations	en_US
dcterms.accessRights	restricted access	en_US

Files in This Item:

File	Description	Size	Format
b14258468.pdf	For All Users (off-campus access for PolyU Staff & Students only)	3.09 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/823