Author: Ma, Jianqi
Title: Exploring the effective utilization of text priors for scene text image super-resolution
Advisors: Zhang, Lei (COMP)
Degree: Ph.D.
Year: 2024
Department: Department of Computing
Pages: xxii, 148 pages : color illustrations
Language: English
Abstract: Scene text image super-resolution (STISR) task aims to enhance the text image resolu­tion and recover the visual-pleasant and legibility text images, benefiting later higher-level textual information process. In this thesis, we present four studies pushing for­ward STISR research in four aspects: recognition information injection, deformation-robustness prior-based learning, benchmarking new dataset and evaluation protocol, and diffusion prior. The first two studies discuss the model designs that activate the text recognition prior for STISR and how to better enhance the text recognition prior for higher-quality scene text recovery. The third study collects a novel bilingual dataset, proposes a novel evaluation setting, and an edge-aware learning baseline for STISR task, pushing forward STISR to a more challenging and practical inference situation. Based on the evaluation settings in the third study, the fourth study ap­plies the latent diffusion model (LDM) further to boost the performance of dense text image recovery.
In Chapter 1, we introduce the common knowledge and history of the scene text image super-resolution and text image recovery, including the dataset benchmarks and the state-of-the-art approaches, and then the arrangement of the whole thesis. In Chapter 2, we present an innovative CNN-based STISR pipeline called TPGSR. This pipeline incorporates the guidance of recognition text prior and introduces mul­tiple solutions for enhancing the text prior, i.e, distilling information from high-resolution (HR) text prior and multi-stage refinement. Building upon this, in Chapter 3, we delve into the design of TATT, a transformer-based approach that addresses the challenge of deformation-robust prior guidance activation. Besides, a text struc­ture consistency (TSC) loss is proposed to ensure the text recovery consistency be­tween the normal text and the deformed text. In Chapter 4 showcases our efforts in collecting a novel Chinese-English STISR dataset known as RealCE. This dataset al­lows for the exploration of more complex structured text recovery with well-annotated transcripts and localization boxes of text instances in the global images. Additionally, we propose a practical and challenging image recovery evaluation protocol to allevi­ate artifacts that text line recovery may occur, and an edge-aware recovery pipeline, utilizing the edge map as the prior information, specifically tailored for global dense text images. In Chapter 5, we introduce a model architecture that applies a latent diffusion model to further enhance STISR performance on global dense text image inputs. To better activate the diffusion prior, we unfreeze the denoising UNet to sufficiently learn the complex text structure and propose a loss signal-to-ratio (SNR) reweighting strategy to stabilize the diffusion model training. An inference strategy is also applied for size-variant input to achieve more stable text recovery results. This novel approach aims to push the boundaries of text image super-resolution by lever­aging the power of latent diffusion modeling. In Chapter 6, we conclude all the works done in the thesis and discuss future research directions based on the works.
To conclude, these research works mentioned in the thesis largely contribute to the de­velopment of the STISR research field by solving significant problems, raising specific challenges, and proposing novel model designs to boost the STISR performance on normal text and deformed text, benchmarking novel dense global text image recovery evaluation with dataset and protocols.
Rights: All rights reserved
Access: open access

Files in This Item:
File Description SizeFormat 
7703.pdfFor All Users64.4 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/13248