|Wong, Ka-hung Raymond
|Application of fuzzy logic to speech coding
|Speech processing systems
Hong Kong Polytechnic University -- Dissertations
|129 leaves : ill. ; 30 cm
|In speech coding, the Linear Prediction Coding (LPC) technique is a powerful tool, which has a wide acceptance in industrial and commercial applications However, one major deficiency of this technique, that using the conventional algorithm, is the loss of naturalness of speech quality. During the past few years, a lot of research had been done to investigate the factors that improve the speech quality of the LPC system. They have shown that the quality of the synthesized speech is affected significantly by the accuracy in the estimated values of i) the pitch periods and ii) the determination of the voiced and unvoiced regions of a speech segment. In commercial applications, the LPC speech data are edited by the "Speech Editor" to refine the quality of the synthesized speech. This editing often involves the adjustment and optimization of the LPC model parameters, such as: the pitch period, and the voiced and unvoiced regions of a speech segment. In this project, a lot of work had been performed to investigate various ways to provide improved estimates of these parameters. Throughout the project, the intuitive knowledge of a group of skillful "Speech Editors" was extracted and analyzed to develop a fuzzy logic system to refine the quality of the LPC synthesized speech. A fuzzy logic system development station was built specifically for the development of this project. The two major speech editing components built in this fuzzy logic system are: - A time domain pitch period extraction algorithm. - A voicing and unvoicing segment discriminator. The pitch period extraction algorithm was developed from the knowledge of the domain expert. The voicing and unvoicing segment discriminator was built based on fuzzy classification techniques with zero crossing density and average peak value as the inputs to the discriminator. The fine tuning of the fuzzy system parameters, such as category, range and degree of membership, was further achieved during the training phase based on the intuitive knowledge obtained from the "Speech Editors". The pitch contour of the synthesized speech produced by this system is compared in performance with that generated using the Gold Rabiner's algorithm and a commercial LPC system. It is shown that this fuzzy system can generate synthesized speech with good intelligibility. Furthermore, the pitch period extraction feature is found to perform better than the other two conventional systems.
|All rights reserved
Files in This Item:
|For All Users (off-campus access for PolyU Staff & Students only)
As a bona fide Library user, I declare that:
- I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
- I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
- I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.
Please use this identifier to cite or link to this item: