Sensor fusion for audio-visual biometric authentication

Cheung, Ming-cheung

Full metadata record

DC Field	Value	Language
dc.contributor	Department of Electronic and Information Engineering	en_US
dc.creator	Cheung, Ming-cheung	-
dc.identifier.uri	https://theses.lib.polyu.edu.hk/handle/200/372	-
dc.language	English	en_US
dc.publisher	Hong Kong Polytechnic University	-
dc.rights	All rights reserved	en_US
dc.title	Sensor fusion for audio-visual biometric authentication	en_US
dcterms.abstract	Although financial transactions via automatic teller machines (ATMs) have become commonplace, the security of these transactions remains a concern. In particular, the verification approach used by today's ATMs can be easily compromised because ATM cards and passwords can be lost or stolen. To overcome this limitation, a new verification approach known as biometrics has emerged. Rather than using passwords as the means of verification, biometric systems verify the identity of a person based on his or her physiological and behavioral characteristics. Numerous studies have shown that biometric systems can achieve high performance under controlled conditions. However, the performance of these systems can be severely degraded under real-world environments. For example, background noise and channel distortion in speech-based systems and variation in illumination intensity and lighting directions in face-based systems are known to be the major causes of performance degradation. To enhance the robustness of biometric systems, multimodal biometrics have been introduced. Multimodal techniques improve the robustness of biometric systems by using more than one biometric traits at the same time. Combining the information from different traits, however, is an important issue. This thesis proposes a multiple-source multiple-sample fusion algorithm to address this issue. The algorithm performs fusion at two levels: intramodal and intermodal. In intramodal fusion, the scores of multiple samples (e.g., utterances and video shots) obtained from the same modality are linearly combined, where the fusion weights are made dependent on the score distribution of the independent samples and the prior knowledge about the score statistics. More specifically, enrollment data are used to compute the mean scores of clients and impostors, which are considered to be the prior scores. During verification, the differences between the individual scores and the prior scores are used to compute the fusion weights. Because the fusion weights depend on verification data, the position of scores in the score sequences is detrimental to the final fused scores. To enhance the discrimination between client and impostor scores, this thesis proposes sorting the score sequences before fusion takes placed. Because verification performance depends on the prior scores, a technique that adapts the prior scores during verification is also developed. In intermodal fusion, the means of intramodal fused scores obtained from different modalities are fused by either linear weighted sums or support vector machines. The final fused score is then used for decision making. The intramodal multisample fusion was evaluated on the HTIMIT corpus and the 2001 NIST speaker recognition evaluation set, and the two-level fusion approach was evaluated on the XM2VTSDB audio-visual corpus. It was found that intramodal multisample fusion achieves a significant reduction in equal error rate as compared to a conventional approach in which equal weights are assigned to all scores. Further improvement can be obtained by either sorting the score sequences or adapting the prior scores. It was also found that multisample fusion can be readily combined with support vector machines for audio-visual biometric authentication. Results show that combining the audio and visual information can reduce error rates by as much as 71%.	en_US
dcterms.extent	vi, viii, 95 leaves : ill. (some col.) ; 30 cm	en_US
dcterms.isPartOf	PolyU Electronic Theses	en_US
dcterms.issued	2005	en_US
dcterms.educationalLevel	All Master	en_US
dcterms.educationalLevel	M.Phil.	en_US
dcterms.LCSH	Hong Kong Polytechnic University -- Dissertations	en_US
dcterms.LCSH	Biometric identification	en_US
dcterms.LCSH	Multisensor data fusion	en_US
dcterms.LCSH	Automatic speech recognition	en_US
dcterms.accessRights	open access	en_US

Files in This Item:

File	Description	Size	Format
b18099749.pdf	For All Users	1.72 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/372