Author: | Yang, Xi |
Title: | Towards effective and efficient real-world video super-resolution |
Advisors: | Zhang, Lei (COMP) |
Degree: | Ph.D. |
Year: | 2024 |
Subject: | High resolution imaging Image processing -- Digital techniques Neural networks (Computer science) Hong Kong Polytechnic University -- Dissertations |
Department: | Department of Computing |
Pages: | xviii, 127 pages : color illustrations |
Language: | English |
Abstract: | With the rapid development of consumer electronics and the Internet, we are entering an era of high-definition visual media. In the era of high-definition visual media, the resolution of videos plays a pivotal role in the quality of the viewer’s experience. With the insatiable demand for higher-resolution content, video super-resolution (VSR) has emerged as a critical area of research within the field of computer vision. VSR refers to the process of reconstructing a high-resolution (HR) video from its low-resolution (LR) counterpart. This not only enhances the visual experience for end-users but also has practical applications in surveillance, medical imaging, and digital restoration of archival footage. The challenge of video super-resolution lies in accurately inferring high-frequency details that are not present in the low-resolution source. Early techniques in VSR were largely based on interpolation methods, which often resulted in artifacts such as blurring and aliasing. The advent of machine learning, particularly deep learning, has revolutionized this field by enabling more sophisticated approaches that can learn complex mappings from LR to HR content, utilizing temporal coherence and contextual information across video frames. In this thesis, we embrace the recent advance of deep neural networks (DNNs) to address the challenges in VSR research, aiming to achieve effective and efficient real-world video super-resolution performance. In Chapter 1, we review some related works, and discuss contribution and organization of this thesis. In Chapter 2, we develop an efficient VSR algorithm with a flow-guided deformable attention propagation module, tackling at real-time online setting, which fit the need for online streaming application like streaming media and video surveillance. The flow-guided deformable attention propagation module leverages the corresponding priors provided by a fast optical flow network in deformable attention computation and consequently helps propagating recurrent state information effectively and efficiently. The proposed algorithm achieves state-of-the-art results on widely-used benchmarking VSR datasets in terms of effectiveness and efficiency. In Chapter 3, we build the first real-world VSR dataset, aiming to bridge the synthetic-to-real gap in previous VSR research and pave the way towards real-world VSR. To help more effectively train VSR models on the proposed dataset, we propose a decomposition based loss considering the characteristics of the constructed datasets. Experiments validate that VSR models trained on our RealVSR dataset demonstrate better visual quality than those trained on synthetic datasets under real-world settings and they also exhibit good generalization capability in cross-camera tests. In Chapter 4, we propose motion-guided latent diffusion (MGLD) based VSR algorithm, which achieves highly competitive real-world VSR results, exhibiting perceptually much more realistic details with fewer flickering artifacts than existing state-of-the-arts. To tackle the ill-poseness of real-world VSR problem, we leverage the powerful generation capability provided by a large pre-trained text-to-image diffusion model. To improve the temporal consistency, we propose a motion-guided sampling strategy and fine-tune variation decoder with an innovative sequence-oriented loss. In Chapter 5, we develop a VSR algorithm by harnessing the capabilities of a robust video diffusion generation prior, achieving temporally consistent and high-quality VSR outcomes. To effectively utilize the diffusion video prior for VSR, we implement a ControlNet-style mechanism to manage the sequence VSR process and fine-tune the model on a large-scale video dataset. The powerful video diffusion prior coupled with our control design enables the model to achieve commendable VSR results at the segment level. To ensure seamless continuity between segments and maintain long-term consistency, we have further crafted a segment-based recurrent inference pipeline. In summary, our works contribute to the development of VSR research by designing more efficient network architecture to boost the efficiency of real-world VSR algorithms, addressing the lack of real-world VSR benchmarking datasets, developing more effective real-world VSR algorithms by exploiting the image and video diffusion priors. |
Rights: | All rights reserved |
Access: | open access |
Copyright Undertaking
As a bona fide Library user, I declare that:
- I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
- I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
- I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.
Please use this identifier to cite or link to this item:
https://theses.lib.polyu.edu.hk/handle/200/13239