Towards effective and efficient real-world video super-resolution

Yang, Xi

Full metadata record

DC Field	Value	Language
dc.contributor	Department of Computing	en_US
dc.contributor.advisor	Zhang, Lei (COMP)	en_US
dc.creator	Yang, Xi	-
dc.identifier.uri	https://theses.lib.polyu.edu.hk/handle/200/13239	-
dc.language	English	en_US
dc.publisher	Hong Kong Polytechnic University	en_US
dc.rights	All rights reserved	en_US
dc.title	Towards effective and efficient real-world video super-resolution	en_US
dcterms.abstract	With the rapid development of consumer electronics and the Internet, we are entering an era of high-definition visual media. In the era of high-definition visual media, the resolution of videos plays a pivotal role in the quality of the viewer’s experience. With the insatiable demand for higher-resolution content, video super-resolution (VSR) has emerged as a critical area of research within the field of computer vision. VSR refers to the process of reconstructing a high-resolution (HR) video from its low-resolution (LR) counterpart. This not only enhances the visual experience for end-users but also has practical applications in surveillance, medical imaging, and digital restoration of archival footage.	en_US
dcterms.abstract	The challenge of video super-resolution lies in accurately inferring high-frequency details that are not present in the low-resolution source. Early techniques in VSR were largely based on interpolation methods, which often resulted in artifacts such as blurring and aliasing. The advent of machine learning, particularly deep learning, has revolutionized this field by enabling more sophisticated approaches that can learn complex mappings from LR to HR content, utilizing temporal coherence and contextual information across video frames.	en_US
dcterms.abstract	In this thesis, we embrace the recent advance of deep neural networks (DNNs) to address the challenges in VSR research, aiming to achieve effective and efficient real-world video super-resolution performance.	en_US
dcterms.abstract	In Chapter 1, we review some related works, and discuss contribution and organization of this thesis.	en_US
dcterms.abstract	In Chapter 2, we develop an efficient VSR algorithm with a flow-guided deformable attention propagation module, tackling at real-time online setting, which fit the need for online streaming application like streaming media and video surveillance. The flow-guided deformable attention propagation module leverages the corresponding priors provided by a fast optical flow network in deformable attention computation and consequently helps propagating recurrent state information effectively and efficiently. The proposed algorithm achieves state-of-the-art results on widely-used benchmarking VSR datasets in terms of effectiveness and efficiency.	en_US
dcterms.abstract	In Chapter 3, we build the first real-world VSR dataset, aiming to bridge the synthetic-to-real gap in previous VSR research and pave the way towards real-world VSR. To help more effectively train VSR models on the proposed dataset, we propose a decomposition based loss considering the characteristics of the constructed datasets. Experiments validate that VSR models trained on our RealVSR dataset demonstrate better visual quality than those trained on synthetic datasets under real-world settings and they also exhibit good generalization capability in cross-camera tests.	en_US
dcterms.abstract	In Chapter 4, we propose motion-guided latent diffusion (MGLD) based VSR algorithm, which achieves highly competitive real-world VSR results, exhibiting perceptually much more realistic details with fewer flickering artifacts than existing state-of-the-arts. To tackle the ill-poseness of real-world VSR problem, we leverage the powerful generation capability provided by a large pre-trained text-to-image diffusion model. To improve the temporal consistency, we propose a motion-guided sampling strategy and fine-tune variation decoder with an innovative sequence-oriented loss.	en_US
dcterms.abstract	In Chapter 5, we develop a VSR algorithm by harnessing the capabilities of a robust video diffusion generation prior, achieving temporally consistent and high-quality VSR outcomes. To effectively utilize the diffusion video prior for VSR, we implement a ControlNet-style mechanism to manage the sequence VSR process and fine-tune the model on a large-scale video dataset. The powerful video diffusion prior coupled with our control design enables the model to achieve commendable VSR results at the segment level. To ensure seamless continuity between segments and maintain long-term consistency, we have further crafted a segment-based recurrent inference pipeline.	en_US
dcterms.abstract	In summary, our works contribute to the development of VSR research by designing more efficient network architecture to boost the efficiency of real-world VSR algorithms, addressing the lack of real-world VSR benchmarking datasets, developing more effective real-world VSR algorithms by exploiting the image and video diffusion priors.	en_US
dcterms.extent	xviii, 127 pages : color illustrations	en_US
dcterms.isPartOf	PolyU Electronic Theses	en_US
dcterms.issued	2024	en_US
dcterms.educationalLevel	Ph.D.	en_US
dcterms.educationalLevel	All Doctorate	en_US
dcterms.LCSH	High resolution imaging	en_US
dcterms.LCSH	Image processing -- Digital techniques	en_US
dcterms.LCSH	Neural networks (Computer science)	en_US
dcterms.LCSH	Hong Kong Polytechnic University -- Dissertations	en_US
dcterms.accessRights	open access	en_US

Files in This Item:

File	Description	Size	Format
7694.pdf	For All Users	45.7 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/13239