Author: | Ko, Tat Leung |
Title: | Investigation of spatio-temporal networks for temporal sequence recognition |
Degree: | M.Sc. |
Year: | 1996 |
Subject: | Automatic speech recognition Neural networks (Computer science) Hong Kong Polytechnic University -- Dissertations |
Department: | Multi-disciplinary Studies |
Pages: | vi, 82 leaves : ill. ; 30 cm |
Language: | English |
Abstract: | This investigation is to verify the effectiveness of applying spatio-temporal approach in the area of temporal sequence recognition, in particular, speech recognition. Speech recognition is fundamentally a pattern classification task. Its objective is to take an input pattern, the speech waveform, and classify it as one of a set of spoken words, phrases, or sentences. The Spatio-temporal Pattern (STP) can be defined as a time-correlated sequence of spatio patterns. With spatio-temporal network, the speech data of a single word can be allocated into a series of time frames. Each time, the data in a specified time frame will be sent to the network for processing. This can reduce the complexity of the network and the processing time. One of the advantages of spatio-temporal network is that the network can be constructed dynamically, thus simulating the effect of training. One of its disadvantages is that different words with same ending may result in ambiguous results. In addition, Recurrent Neural Networks (RNNs) were applied to recognize the speech data by using the Real Time Recurrent Learning (RTRL) algorithm which is a gradient following learning algorithm for completely recurrent networks running in continuous sampled time. The merit of RTRL algorithm is its ability to process input data continuously without any requirement for a fixed, or even unbounded epoch length. Its drawback is that it requires a great deal of computation on each update cycle, and it is non local. From the results obtained, for both algorithms, the accuracy of recognition is about 75%. We have found that, the spatio-temporal networks are more suitable for speech recognition for the same speaker, whilst RTRL algorithm is more appropriate for speech recognition for multiple speakers. In order to increase the accuracy, for spatio-temporal networks, we found that the number of time frames for each word should be more or less the same. For the RTRL algorithm, it is better to use minimum squared errors instead of slope differences to determine how close the network output curve is to the desired output curve. |
Rights: | All rights reserved |
Access: | restricted access |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
b12197993.pdf | For All Users (off-campus access for PolyU Staff & Students only) | 2.61 MB | Adobe PDF | View/Open |
Copyright Undertaking
As a bona fide Library user, I declare that:
- I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
- I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
- I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.
Please use this identifier to cite or link to this item:
https://theses.lib.polyu.edu.hk/handle/200/4164