A study of phoneme synthesis with neural networks

Tung, Ching-hon

Author:	Tung, Ching-hon
Title:	A study of phoneme synthesis with neural networks
Degree:	M.Sc.
Year:	1995
Subject:	Neural networks (Computer science) Speech processing systems Speech synthesis Hong Kong Polytechnic University -- Dissertations
Department:	Multi-disciplinary Studies Department of Electronic Engineering
Pages:	ii, 53 leaves : ill. (some col.) ; 30 cm
Language:	English
Abstract:	This report is mainly divided into two parts. The first part of the report is a preliminary study of the properties of Linear Prediction Coding (LPC)[1] and Artificial Neural Networks (ANN). Some phoneme units are digitised and recorded in the Sound Blaster (SB) voice files. They are converted to LPC and then back to the SB files again. The difference and distortion between the original and synthetic sound are investigated. Moreover, a multi-layer back-propagation network (MLP)[2] is constructed to map the encoded phonemes to the LPC with the order of 12. The function mapping and memory property of MLP are well demonstrated. The 2nd part of the report contains the main objective - A study of phoneme synthesis with neural networks. A real time recurrent learning network (RTRL)[3][4] is constructed and the capability to achieve the task is investigated. First, teacher forced learning method mentioned in [4] is used to train the network and a stable oscillation output can be achieved after training. The result shows that this learning method may not be able to achieve the task because the stable oscillation cannot follow the variation of the speech waveform. Phonemes can be simply divided into vowels and consonants. Vowels are voiced and periodic while the consonants are voiceless and non periodic. Since the network mentioned in [3] can be applied to model a 2nd order IIR lowpass filter and can be used for function prediction, it gives an idea that the network may be trained to behave like a filter which can generate the waveforms by inputting the information of the frequency components of the waveforms. The network is modified so that it can behave like a recursive filter to generate the waveforms by applying output feedback and some frequency components of the waveforms. Hence, some simple vowels are used to train the network and the results are promising. The network seems able to learn to generate the periodic waveforms such as the vowels but the synthesis of the non-periodic consonants are still remain unsolved.
Rights:	All rights reserved
Access:	restricted access

Files in This Item:

File	Description	Size	Format
b15554168.pdf	For All Users (off-campus access for PolyU Staff & Students only)	3.94 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/2062