Author: Zhang, Rongchen
Title: An advancing framework for complex fabric image retrieval : from multi-scale feature fusion to efficient attention-based matching
Advisors: Wong, Wai Keung (SFT)
Degree: Ph.D.
Year: 2025
Department: School of Fashion and Textiles
Pages: viii, 126 pages : color illustrations
Language: English
Abstract: Accurate fabric image retrieval is significant for modern textile and apparel industries, as it directly impacts inventory management, production efficiency, and design innovation. Despite advancements in content-based image retrieval, existing methods struggle with complex fabric patterns including plaid, lace, striped and printed fabrics, covering patterns ranging from geometric, floral to abstract designs, which are difficult because of the high intra-class variability, and diverse real-world conditions. These methods exhibit three key limitations: (1) insufficient multi-scale feature representation, limiting their ability to simultaneously capture fine-grained textures and global structures; (2) inconsistency in feature representations across hierarchical retrieval frameworks, causing mismatches between coarse and fine retrieval stages; and (3) inefficiency in local feature matching due to high computational complexity.
To address these challenges, this study proposes a novel framework to enhance the precision, efficiency, and robustness of fabric image retrieval systems. The framework comprises three main components. First, the Multi-Scale Local Descriptors Fusion (MLDF) method is introduced. This method employs a multi-scale feature extraction module with convolutional layers of varying receptive fields to capture both fine-grained textures and broader structural patterns. Feature fusion is achieved through Mixer Modules, which integrate token and channel dimensions to ensure a comprehensive representation. Additionally, a progressive triplet mining strategy is implemented to optimize feature embeddings, enhancing the discriminative power for complex fabric patterns.
Second, the Hierarchical Two-Stage Retrieval Framework is proposed. This framework utilizes global descriptors for efficient coarse retrieval and local descriptors for fine-grained refinement. An enhanced triplet loss function ensures consistency across feature spaces, improving inter-class separability and intra-class compactness. This approach effectively balances computational efficiency with retrieval accuracy, addressing the limitations of both single-stage and traditional two-stage methods.
Third, the Efficient Local Feature Matching via Cross Attention (ELFM) method is developed. This method incorporates a Cross Attention Module to dynamically align local features between query and candidate images, capturing fine-grained relationships. By combining a learnable attention mechanism with feature aggregation, ELFM enables precise similarity computation and re-ranking. The method achieves high precision and recall, making it suitable for industrial applications requiring accurate and efficient retrieval.
The proposed methods were evaluated on a newly constructed dataset of 2,448 fabric images from 537 categories of patterns, reflecting real-world variability in both controlled and natural environments. Experimental results demonstrate significant improvements in retrieval precision, recall, and computational efficiency compared to traditional handcrafted features and existing deep learning models. This study makes contributions in the following ways: (1) advancing multi-scale feature fusion techniques for complex texture representation, (2) pioneering a hierarchical two-stage framework with feature space consistency, and (3) optimizing local feature matching via attention mechanisms. These innovations bridge the gap between theoretical robustness and industrial scalability, offering a unified solution for accurate and efficient fabric retrieval.
Rights: All rights reserved
Access: open access

Files in This Item:
File Description SizeFormat 
8780.pdfFor All Users3.35 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/14329