Full metadata record
| DC Field | Value | Language |
|---|---|---|
| dc.contributor | Department of Electrical and Electronic Engineering | en_US |
| dc.contributor.advisor | Chan, Yui-lam (EEE) | en_US |
| dc.creator | Huang, Ziyin | - |
| dc.identifier.uri | https://theses.lib.polyu.edu.hk/handle/200/14236 | - |
| dc.language | English | en_US |
| dc.publisher | Hong Kong Polytechnic University | en_US |
| dc.rights | All rights reserved | en_US |
| dc.title | Screen content video quality enhancement (SCVQE) based on machine learning | en_US |
| dcterms.abstract | The increasing popularity of intelligent terminals has led to a higher demand for screen content videos. Applications such as the cloud gaming, video conference, online education, etc., rely heavily on Screen Content Coding (SCC). The impact of the COVID-19 pandemic in 2020 further accelerated the necessity of online education and virtual conferences, making SCC indispensable for effective screen sharing. This paradigm shift has elevated SCV from a niche to mainstream media. Consequently, enhancing the quality of screen content videos has become a critical challenge. In this thesis, we conduct an in-depth study on deep-learning-based VQE of SCC and propose effective learning frameworks based on the characteristics of screen content videos (SCVs). | en_US |
| dcterms.abstract | Firstly, we study the dedicated tools—Intra Block Copy (IBC) and palette (PLT) modes in the SCC standard, which induces the corresponding compression loss of the decoded video. Therefore, we propose a novel post-processing network for enhancing decoded screen content videos based on the coding mode information embedded in the coded bitstream. By fusing three binary mode masks derived from dedicated coding tools with the corresponding decoded frame, we aim to elevate the quality of SCVs. | en_US |
| dcterms.abstract | Secondly, different from natural videos, screen content videos often feature abrupt scene switches and frame freezing instances, leading to visible distortions in compressed videos. Existing alignment-based models struggle to effectively enhance scene switch frames and lack efficiency when dealing with frame freezing situations. Therefore, we propose a novel alignment-free method that effectively handles both scene switches and frame freezing. In our approach, we develop a spatial and temporal feature extraction module to compress and extract spatio-temporal information from three groups of frame inputs. This enables efficient handling of scene switches. In addition, an edge aware block is proposed to extract edge information, which guides the model to focus on restoring the high-frequency components in frame freezing situations. The fusion module is then designed to adaptively fuse the features from three groups, considering different positions of video frames, to enhance frames during scene switch and frame freezing scenarios. | en_US |
| dcterms.abstract | Thirdly, existing multiple-frame models using a fixed range of neighbor frames face challenges in effectively enhancing frames during scene switches and lack efficiency in reconstructing high-frequency information. To address these limitations, we present a novel method proficient in managing scene switches and reconstructing high-frequency information. In the feature extraction part, we develop long-term and short-term feature extraction streams, in which the long-term feature extraction stream learns the contextual information, and the short-term feature extraction stream extracts more related information from shorter input to assist the long-term stream to handle fast motion and scene switches. To further enhance the frame quality during scene switches, we incorporate a similarity-based neighbor frame selector before feeding frames into the short-term stream. This selector identifies relevant neighbor frames, aiding in the efficient handling of scene switches. To dynamically fuse the short-term feature and long-term features, the multi-scale feature distillation focuses on adaptively recalibrating channel-wise feature responses to achieve effective feature distillation. In the reconstruction part, a high-frequency reconstruction block is proposed for guiding the model to restore the high-frequency components. | en_US |
| dcterms.abstract | The frameworks proposed in this thesis are evaluated through comparisons with other state-of-the-art methods, including the posed databases and the in-the-wild databases. Ablation studies and robustness tests confirm the promising performance of our frameworks, highlighting the efficacy of the novel designs in enhancing screen content quality. | en_US |
| dcterms.extent | 126 pages : color illustrations | en_US |
| dcterms.isPartOf | PolyU Electronic Theses | en_US |
| dcterms.issued | 2025 | en_US |
| dcterms.educationalLevel | Ph.D. | en_US |
| dcterms.educationalLevel | All Doctorate | en_US |
| dcterms.LCSH | Video compression | en_US |
| dcterms.LCSH | Machine learning | en_US |
| dcterms.LCSH | Image processing | en_US |
| dcterms.LCSH | Digital video | en_US |
| dcterms.LCSH | Hong Kong Polytechnic University -- Dissertations | en_US |
| dcterms.accessRights | open access | en_US |
Copyright Undertaking
As a bona fide Library user, I declare that:
- I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
- I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
- I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.
Please use this identifier to cite or link to this item:
https://theses.lib.polyu.edu.hk/handle/200/14236

