Screen content video quality enhancement (SCVQE) based on machine learning

Huang, Ziyin

Full metadata record

DC Field	Value	Language
dc.contributor	Department of Electrical and Electronic Engineering	en_US
dc.contributor.advisor	Chan, Yui-lam (EEE)	en_US
dc.creator	Huang, Ziyin	-
dc.identifier.uri	https://theses.lib.polyu.edu.hk/handle/200/14236	-
dc.language	English	en_US
dc.publisher	Hong Kong Polytechnic University	en_US
dc.rights	All rights reserved	en_US
dc.title	Screen content video quality enhancement (SCVQE) based on machine learning	en_US
dcterms.abstract	The increasing popularity of intelligent terminals has led to a higher demand for screen content videos. Applications such as the cloud gaming, video conference, online education, etc., rely heavily on Screen Content Coding (SCC). The impact of the COVID-19 pandemic in 2020 further accelerated the necessity of online education and virtual conferences, making SCC indispensable for effective screen sharing. This paradigm shift has elevated SCV from a niche to mainstream media. Consequently, enhancing the quality of screen content videos has become a critical challenge. In this thesis, we conduct an in-depth study on deep-learning-based VQE of SCC and propose effective learning frameworks based on the characteristics of screen content videos (SCVs).	en_US
dcterms.abstract	Firstly, we study the dedicated tools—Intra Block Copy (IBC) and palette (PLT) modes in the SCC standard, which induces the corresponding compression loss of the decoded video. Therefore, we propose a novel post-processing network for enhancing decoded screen content videos based on the coding mode information embedded in the coded bitstream. By fusing three binary mode masks derived from dedicated coding tools with the corresponding decoded frame, we aim to elevate the quality of SCVs.	en_US
dcterms.abstract	Secondly, different from natural videos, screen content videos often feature abrupt scene switches and frame freezing instances, leading to visible distortions in compressed videos. Existing alignment-based models struggle to effectively enhance scene switch frames and lack efficiency when dealing with frame freezing situations. Therefore, we propose a novel alignment-free method that effectively handles both scene switches and frame freezing. In our approach, we develop a spatial and temporal feature extraction module to compress and extract spatio-temporal information from three groups of frame inputs. This enables efficient handling of scene switches. In addition, an edge aware block is proposed to extract edge information, which guides the model to focus on restoring the high-frequency components in frame freezing situations. The fusion module is then designed to adaptively fuse the features from three groups, considering different positions of video frames, to enhance frames during scene switch and frame freezing scenarios.	en_US
dcterms.abstract	Thirdly, existing multiple-frame models using a fixed range of neighbor frames face challenges in effectively enhancing frames during scene switches and lack efficiency in reconstructing high-frequency information. To address these limitations, we present a novel method proficient in managing scene switches and reconstructing high-frequency information. In the feature extraction part, we develop long-term and short-term feature extraction streams, in which the long-term feature extraction stream learns the contextual information, and the short-term feature extraction stream extracts more related information from shorter input to assist the long-term stream to handle fast motion and scene switches. To further enhance the frame quality during scene switches, we incorporate a similarity-based neighbor frame selector before feeding frames into the short-term stream. This selector identifies relevant neighbor frames, aiding in the efficient handling of scene switches. To dynamically fuse the short-term feature and long-term features, the multi-scale feature distillation focuses on adaptively recalibrating channel-wise feature responses to achieve effective feature distillation. In the reconstruction part, a high-frequency reconstruction block is proposed for guiding the model to restore the high-frequency components.	en_US
dcterms.abstract	The frameworks proposed in this thesis are evaluated through comparisons with other state-of-the-art methods, including the posed databases and the in-the-wild databases. Ablation studies and robustness tests confirm the promising performance of our frameworks, highlighting the efficacy of the novel designs in enhancing screen content quality.	en_US
dcterms.extent	126 pages : color illustrations	en_US
dcterms.isPartOf	PolyU Electronic Theses	en_US
dcterms.issued	2025	en_US
dcterms.educationalLevel	Ph.D.	en_US
dcterms.educationalLevel	All Doctorate	en_US
dcterms.LCSH	Video compression	en_US
dcterms.LCSH	Machine learning	en_US
dcterms.LCSH	Image processing	en_US
dcterms.LCSH	Digital video	en_US
dcterms.LCSH	Hong Kong Polytechnic University -- Dissertations	en_US
dcterms.accessRights	open access	en_US

Files in This Item:

File	Description	Size	Format
8691.pdf	For All Users	13.02 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/14236