Author: Guo, Jingcai
Title: Learning robust visual-semantic mapping for zero-shot learning
Advisors: Guo, Song (COMP)
Degree: Ph.D.
Year: 2021
Subject: Machine learning
Artificial intelligence
Hong Kong Polytechnic University -- Dissertations
Department: Department of Computing
Pages: xix, 130 pages : color illustrations
Language: English
Abstract: Zero-shot learning (ZSL) aims at recognizing unseen class examples (e.g., images) with knowledge transferred from seen classes. This is typically achieved by exploiting a semantic feature space shared by both seen and unseen classes, e.g., attributes or word vectors, as the bridge. In ZSL, the common practice is to train a mapping function between the visual and semantic feature spaces with labeled seen class examples. When inferring, given unseen class examples, the learned mapping function is reused to them and recognizes the class labels on some metrics among their semantic relations. However, the visual and semantic feature spaces are generally independent and exist in entirely different manifolds. Under such a paradigm, the ZSL models may easily suffer from the domain shift problem when constructing and reusing the mapping function, which becomes the major challenge in ZSL. In this thesis, we explore effective ways to mitigate the domain shift problem and learn a robust mapping function between the visual and semantic feature spaces. We focus on fully empowering the semantic feature space, which is one of the key building blocks of ZSL. First, we consider to adaptively adjust to rectify the semantic feature space for ZSL. Conventional ZSL models generally regard the semantic feature space unchangeable during training. However, it can be observed that when mapping visual features to semantic feature space, the obtained semantic features are usually overly concentrated. This deficiency affects models' ability to adapt and generalize to more unseen classes. As we know, the process of human beings understanding things is constantly improving. Similarly, we argue the semantic feature space also needs to be dynamically adjusted to accommodate more robust learning of mapping function. Specifically, the adjustment is conducted on both the class prototypes and global distribution during training. Moreover, we also propose to combine the adjustment with a cycle mapping to formulate the training to a more efficient framework that can not only rectify the semantic feature space but also speed up the training process.
Second, we consider to align the manifold structures between the visual and semantic feature spaces by expanding the semantic features. Compared to our first method which directly adjusts to rectify the semantic feature space, the expansion process is more conservative and soft. Specifically, we build upon an autoencoder-based model to generate several auxiliary semantic features combining with the previous ones, to expand the space. Additionally, the expansion is jointly guided by an embedded manifold extracted from the visual feature space, which retains its geometrical and structural information. By aligning the two feature spaces, the trained mapping function is more robust and well-matched that significantly mitigates the domain shift problem. Last, we consider to take a further step to explore and empower the correlation between the visual and semantic feature spaces in a more fine-grained perspective. Unlike most existing and our previous works, we decompose an image example into several parts and use an example-level graph-based model to measure and utilize certain relations among these parts. Taking advantage of recently developed graph neural networks, we further formulate the ZSL to a graph-to-semantic mapping problem, which can better exploit the visual and semantic correlation and the local substructure information in example. In summary, this thesis targets fully empowering the semantic feature space and design effective solutions to mitigate the domain shift problem and hence obtain a more robust visual-semantic mapping function for ZSL. Extensive experiments on various datasets demonstrate the effectiveness of our proposed methods.
Rights: All rights reserved
Access: open access

Files in This Item:
File Description SizeFormat 
5558.pdfFor All Users13.22 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/11097