Author: Sun, Zhengwentai
Title: A deep learning approach for fashion image processing with controllable synthesis and flexible editing
Advisors: Mok, P. Y. (SFT)
Degree: M.Phil.
Year: 2024
Department: School of Fashion and Textiles
Pages: xiv, 127 pages : color illustrations
Language: English
Abstract: Fashion design typically involves composing elements and concepts, where designers select and harmonize colors, patterns, prints, and consider functional attributes like collar types, sleeve length, and overall fit. This process, reflecting the designer's creativity and market preferences, usually requires iterative modifications and can be time-consuming even for experts. Although recent advances in generative models offer efficient and effective way of processing of fashion images, applying these models in design remains challenging. The generative models primarily map random noise into an image, and the process is arbitrary and uncontrollable that requires multiple attempts to achieve a satisfactory image, meeting certain specific requirements.
A primary solution in enhancing the experience of generating desired garment images could involve detailed supervisory information. For instance, by collecting a fashion garment dataset with detailed annotations of each design element, the generative models could learn a conditional mapping from specific elements to the desired garment image. However, an obvious drawback of such a solution is the requirement of tedious annotation, which could be time-consuming and expensive. Moreover, those labels usually consider a discrete attribute where each element will be assigned to a category. When using such a model to consider the design process, its flexibility is limited as there are multiple design elements that are hard to categorize, e.g., colors and/or textures.
To address the above-mentioned challenges in controllability and flexibility, this study develops generative models involving a decoupling method in the data collection and training. The overall motivation is to decouple a garment image into different modalities of data, each representing different design elements. For instance, the HED model is utilized to extract sketches that represent spatial level attributes like collars, lengths, and overall shapes. At the texture level, the cropped image patches are employed. These decoupled data, derived partially from the original garment images, are used to train generative models with the capable of reconstructing the original images. The trained model enables control over the synthesized garment image by selecting specific design elements during the inference stage.
Building on this capability, this thesis introduces an image processing system that involves two models: a controllable generation model and a flexible editing model, each targeting different fashion image processing tasks. The first model, called SGDiffs, focuses on the control over texture, the generation model leverages randomly cropped texture patches and text prompts to reconstruct garments. Once trained, it uses texture patches as decoupled style condition to control the synthesized garment images. Subsequently, an editing model, called CoDE-GAN, is introduced to modify the shape of fashion images. It learns the editing function by reconstructing masked images using sketch maps. The two models can work independently or integratively as one system, enabling effective and flexible control in the generation and editing of fashion images. Both models have been comprehensively evaluated to demonstrate their specific advantages in comparison of other state-of-the-art models.
Rights: All rights reserved
Access: open access

Files in This Item:
File Description SizeFormat 
8407.pdfFor All Users6.66 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/13948