Multi-lingual (Cantonese, Mandarin and English) speech recognition and voice response system

Li, Nga-ling Bavy

Author:	Li, Nga-ling Bavy
Title:	Multi-lingual (Cantonese, Mandarin and English) speech recognition and voice response system
Degree:	M.Phil.
Year:	2001
Subject:	Automatic speech recognition Speech processing systems Hong Kong Polytechnic University -- Dissertations
Department:	Department of Computing
Pages:	viii, 91 leaves : ill. ; 30 cm
Language:	English
Abstract:	As computer technology increasingly permeates our daily lives, hundreds of speech recognition applications are being implemented and run in business, industry and customer services areas. Hong Kong is a multicultural city, which allows people to use their native tongues to communicate within the same group, to support the three common dialects of Cantonese, Mandarin and English. In this thesis, it was aimed to build an integrated Automatic Speech Recognition (ASR) system for the three mentioned dialects without applying any prior knowledge of linguistic information. For constructing our speech recognition system, (1) Speech Segmentation, (2) Speech Preprocessing, and (3) Speech Recognition are the three essential phases to study in this thesis. The objectives of the thesis include: (1) Finding a segmentation algorithm good for all three different dialects without any prior linguistic knowledge of any of them. (2) Using different existing parametric representations to produce different ranges of improvement on different speech recognition mechanisms for the three dialects. (3) Designing an integrated ASR system which would produce better results across the three dialects. In this thesis, the overall performance of our proposed segmentation algorithm and our proposed recognition algorithm were also measured through comparison with some common existing algorithms. From our experimental results, our proposed Linguistically Free Segmentation (LFS) method is shown to be much more stable than the traditional Zero Crossing method by considering their standard deviation. It is also shown that different existing parametric representations give varied ranges of improvement on different speech recognition mechanisms for the three dialects. In this thesis, the best performance for recognizing Cantonese can be achieved by applying Mel-frequency Cepstral Coefficients (MFCCs) features into Improved Naive Bayesian Classification (INBC), whereas the best performance for recognizing Mandarin and English can be achieved by applying MFCCs features into Hidden Markov Modeling (HMM) with Viterbi algorithm. From the results, it is indicated that an integrated ASR system (the composition of different algorithms from segmentation, preprocessing, and recognition phases) is needed for constructing a reliable speech understanding system for different kinds of spoken-languages in the society. Finally, such integrated ASR system for the three studied dialects was followed by the use of a Zoological Fortune Telling application. We believe that the development of an integrated ASR system can be applied for a Voice Response System, which can provide smart support for millions of business transaction or enquiry customer service everyday. Such system can improve traditional human-computer interactions by permitting users to retrieve or manipulate different forms (speech, text, graphics, or set of actions) of output from applications. This part will be set as the future enhancement of our integrated ASR system and will not be emphasized in this thesis.
Rights:	All rights reserved
Access:	open access

Files in This Item:

File	Description	Size	Format
b15995288.pdf	For All Users	3.13 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/4905