Explore semantic-channel correlation for multi-label zero-shot learning

Liu, Ziming

Full metadata record

DC Field	Value	Language
dc.contributor	Department of Computing	en_US
dc.contributor.advisor	Guo, Song (COMP)	en_US
dc.creator	Liu, Ziming	-
dc.identifier.uri	https://theses.lib.polyu.edu.hk/handle/200/13130	-
dc.language	English	en_US
dc.publisher	Hong Kong Polytechnic University	en_US
dc.rights	All rights reserved	en_US
dc.title	Explore semantic-channel correlation for multi-label zero-shot learning	en_US
dcterms.abstract	Multi-label zero-shot learning (MLZSL) aims to predict multiple unseen classes through knowledge of seen classes in complex image scenarios. Different from traditional zero-shot learning (ZSL), we not only need to share semantic features to ensure that the model can transfer from the seen domain to the unseen domain, but also need to understand the correlation between multiple classes. In MLZSL, the common practice is to mine spatial information and build the spatial correlation of various classes, or use the principal directions of image features to predict multiple unseen classes. However, unreasonable extraction or over-reliance on spatial features will lead to the loss of semantic information and class imbalance. In addition, another approach that relies on the principal vector of the image, and its neglect of diversity may lead to stretched performance in a multi-label environment. In view of the above-mentioned main challenges in MLZSL, in this thesis, we re-examine the class semantics and inter-class relationships from the perspective of channel response and solve the class imbalance problem generated during the feature extraction process. In addition, we also focus on building diverse and discriminative principal vectors.	en_US
dcterms.abstract	Firstly, we consider that the status of classes within the visual-semantic space of MLZSL is equal, but traditional methods based on spatial information ignore the class imbalance problem. For some “minor classes” that appear less frequently and have smaller sample sizes, their neglect will lead to a lack of semantic comprehensiveness. In view of this, we design a new feature pyramid paradigm to preserve the “minor classes” during feature extraction. We then designed the channel-wise pyramid feature attention to strengthen their responses. Finally, in order to maintain the semantic association within the image, we adopt semantic attention. This framework not only preserves the responses of “minor classes” but also establishes element-wise correlation among semantic vectors.	en_US
dcterms.abstract	Secondly, we found that classes with similar semantics are related in their responses in the feature channel. Different classes of channel responses can also exist as special semantic correlation. Therefore, we achieve the purpose of predicting multiple unseen classes by shuffling the feature maps into groups, mining the channel responses of each group of features, and encoding them into semantic vectors. At the same time, the model not only avoids the loss of scene information caused by excessive mining of spatial information from past methods but also reduces the dimensionality of semantic extraction from two-dimensional spatial information to one-dimensional channel response, which greatly reduces the computational overhead of the model and improves inference speed.	en_US
dcterms.abstract	Finally, existing methods that use the principal vectors of image features to perform MLZSL tasks generally suffer from the problem of limited semantic diversity and discrimination, which causes the semantic vectors to lack class-related information during the inference process. To address the above problems, we integrate the global and local features from the channel-wise of the feature and design a unique semantic representation. In addition, we also use graph convolutional networks (GCN) to construct associations between different semantic vectors, effectively improving the multi-label expression missing in existing methods.	en_US
dcterms.abstract	In summary, the goal of this thesis is to fully and effectively mine effective semantic information in a multi-label environment, and to design effective solutions to alleviate the lack of semantic information. Sufficient experimental results on several large-scale multi-label datasets demonstrate the effectiveness of our proposed new methods.	en_US
dcterms.extent	xviii, 143 pages : color illustrations	en_US
dcterms.isPartOf	PolyU Electronic Theses	en_US
dcterms.issued	2024	en_US
dcterms.educationalLevel	Ph.D.	en_US
dcterms.educationalLevel	All Doctorate	en_US
dcterms.LCSH	Machine learning	en_US
dcterms.LCSH	Artificial intelligence	en_US
dcterms.LCSH	Semantics	en_US
dcterms.LCSH	Hong Kong Polytechnic University -- Dissertations	en_US
dcterms.accessRights	open access	en_US

Files in This Item:

File	Description	Size	Format
7582.pdf	For All Users	11.32 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/13130