Explore semantic-channel correlation for multi-label zero-shot learning

Liu, Ziming

Author:	Liu, Ziming
Title:	Explore semantic-channel correlation for multi-label zero-shot learning
Advisors:	Guo, Song (COMP)
Degree:	Ph.D.
Year:	2024
Subject:	Machine learning Artificial intelligence Semantics Hong Kong Polytechnic University -- Dissertations
Department:	Department of Computing
Pages:	xviii, 143 pages : color illustrations
Language:	English
Abstract:	Multi-label zero-shot learning (MLZSL) aims to predict multiple unseen classes through knowledge of seen classes in complex image scenarios. Different from traditional zero-shot learning (ZSL), we not only need to share semantic features to ensure that the model can transfer from the seen domain to the unseen domain, but also need to understand the correlation between multiple classes. In MLZSL, the common practice is to mine spatial information and build the spatial correlation of various classes, or use the principal directions of image features to predict multiple unseen classes. However, unreasonable extraction or over-reliance on spatial features will lead to the loss of semantic information and class imbalance. In addition, another approach that relies on the principal vector of the image, and its neglect of diversity may lead to stretched performance in a multi-label environment. In view of the above-mentioned main challenges in MLZSL, in this thesis, we re-examine the class semantics and inter-class relationships from the perspective of channel response and solve the class imbalance problem generated during the feature extraction process. In addition, we also focus on building diverse and discriminative principal vectors. Firstly, we consider that the status of classes within the visual-semantic space of MLZSL is equal, but traditional methods based on spatial information ignore the class imbalance problem. For some “minor classes” that appear less frequently and have smaller sample sizes, their neglect will lead to a lack of semantic comprehensiveness. In view of this, we design a new feature pyramid paradigm to preserve the “minor classes” during feature extraction. We then designed the channel-wise pyramid feature attention to strengthen their responses. Finally, in order to maintain the semantic association within the image, we adopt semantic attention. This framework not only preserves the responses of “minor classes” but also establishes element-wise correlation among semantic vectors. Secondly, we found that classes with similar semantics are related in their responses in the feature channel. Different classes of channel responses can also exist as special semantic correlation. Therefore, we achieve the purpose of predicting multiple unseen classes by shuffling the feature maps into groups, mining the channel responses of each group of features, and encoding them into semantic vectors. At the same time, the model not only avoids the loss of scene information caused by excessive mining of spatial information from past methods but also reduces the dimensionality of semantic extraction from two-dimensional spatial information to one-dimensional channel response, which greatly reduces the computational overhead of the model and improves inference speed. Finally, existing methods that use the principal vectors of image features to perform MLZSL tasks generally suffer from the problem of limited semantic diversity and discrimination, which causes the semantic vectors to lack class-related information during the inference process. To address the above problems, we integrate the global and local features from the channel-wise of the feature and design a unique semantic representation. In addition, we also use graph convolutional networks (GCN) to construct associations between different semantic vectors, effectively improving the multi-label expression missing in existing methods. In summary, the goal of this thesis is to fully and effectively mine effective semantic information in a multi-label environment, and to design effective solutions to alleviate the lack of semantic information. Sufficient experimental results on several large-scale multi-label datasets demonstrate the effectiveness of our proposed new methods.
Rights:	All rights reserved
Access:	open access

Files in This Item:

File	Description	Size	Format
7582.pdf	For All Users	11.32 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/13130