Author: Fung, Kai-tat
Title: Scalable video and audio techniques for video conferencing
Degree: M.Phil.
Year: 2001
Subject: Videoconferencing
Hong Kong Polytechnic University -- Dissertations
Department: Department of Electronic and Information Engineering
Pages: xiv, 110 leaves : ill. ; 30 cm
Language: English
Abstract: With the advance of video and audio compression and networking technologies, networked multimedia services, such as multipoint video conferencing, video on demand and digital TV, are emerging. We envision a central server (MCU) that may have to support quality of service to heterogeneous clients or transmission channels and it is in this scenario that this server has the capability to perform transcoding in video and audio mixing. In video transcoding, the conventional approach needs to decode the incoming video bitstream in the pixel domain, and the decoded video frame is re-encoded at the desired output bitrate according to the capability of the clients' devices and the available bandwidth of the network. This involves high processing complexity, memory, delay and video degradation. In the audio mixing, the audio signal is usually distorted by the background noise from other channels and makes the speech signal quality degraded. The aim of this study is to find ways that can reduce the computational complexity and provide good quality of video and audio in the video conferencing. In this thesis, we focus on four major aspects of a video conferencing system. They are the video transcoding in multipoint video conferencing, the wavelet based video coder, speech recovery and audio coding. The first half of the thesis is concerned with the video processing while the second half is concerned with the audio processing. In the first half of the thesis, a new frame skipping transcoder is proposed to greatly reduce the computational complexity and reduce the quality degradation. The proposed architecture is mainly performed on the discrete cosine transform (DCT) domain to achieve a low complexity transcoder. It is observed that the re-encoding error is significantly reduced at the frame-skipping transcoder when the strategy of a direct summation of DCT coefficients is employed. By using the proposed frame-skipping transcoder, the video qualities of the active sub-sequences can be improved significantly. Besides, most video conferencing systems use DCT-based encoders. However, under low bit rates, a DCT-based encoder exhibits visually annoying blocking artifacts. Recently, wavelets have been used in internet applications. The major advantage of using a wavelet is its high quality and the absence of blocking artifacts when compared to the conventional video encoder. Although a wavelet-based coder can achieve a good quality, its computational speed is an area of concern. Motivated by this, a new region-based video coder architecture is proposed to achieve a good video quality with a low complexity. The proposed video coder is based on the adaptive region-based updating technique by which the video is updated according to the motion activity. A simple and fast object tracking technique is proposed to locate the region of interest. Features of the proposal includes (i) a user-specified region of interest selection as to which the region can be changed by the user at any time instance and (ii) an adaptive bit allocation that allows the user to specify the relative quality between the foreground and the background to increase the interactivity. This architecture guarantees a high video quality in the region of interest while reducing the overall bit rate and the computation time even under low bit rates. In the second half of the thesis, we address a problem of speech enhancement, which is to recover a speech source from a mixture of its delayed versions and additive noise. By using the constrained optimisation technique, an algorithm based on the second order statistics is developed. The new proposed algorithm requires no strong limitations to the speech signal and the noise. Simulation results show that our algorithm achieves a better performance as compared to other algorithms. Finally, although the MPEG Audio provides the perceptual lossless audio compression, the demanded bitrate and the computational complexity are higher than the conventional speech coding approach. Motivated by this, a fast bit allocation algorithm for the MPEG audio encoder is proposed, which is able to generate an identical MPEG bitstream produced by the standard bit allocation algorithm described in MPEG audio standard. The proposed algorithm employs the bit allocation information of the previous frame as a reference for allocating the restricted bits to each of the 32 subbands in the current frame such that the number of iterations can be significantly reduced. Results of the study show that the performance of the proposed bit allocation algorithm works well at different encoded bitrates. It is exciting to report in this thesis that significant gains in terms of computation and scalability can be achieved by employing our adaptive approaches. Undoubtedly, these adaptive techniques can enable the video conferencing to become more scaleable and provide good quality video and audio in practical situations.
Rights: All rights reserved
Access: open access

Files in This Item:
File Description SizeFormat 
b15995318.pdfFor All Users3.77 MBAdobe PDFView/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: