Label and computation-efficient deep segmentation for images and point clouds

Li, Ruihuang

Full metadata record

DC Field	Value	Language
dc.contributor	Department of Computing	en_US
dc.contributor.advisor	Zhang, Lei (COMP)	en_US
dc.creator	Li, Ruihuang	-
dc.identifier.uri	https://theses.lib.polyu.edu.hk/handle/200/13202	-
dc.language	English	en_US
dc.publisher	Hong Kong Polytechnic University	en_US
dc.rights	All rights reserved	en_US
dc.title	Label and computation-efficient deep segmentation for images and point clouds	en_US
dcterms.abstract	Segmentation is an essential task in computer vision, which aims to divide an image or point cloud into several disjoint sets of pixels or points that correspond to different objects or regions. Segmentation has a wide range of applications, such as autonomous driving, robotics, augmented reality, and medical image analysis. Deep learning techniques such as convolutional neural networks (CNNs) and Transformers have signiﬁcantly improved the accuracy of image and point cloud segmentation, while their computational complexity and requirements of a vast amount of labeled data are still bottlenecks for many real-time applications. Researchers have proposed different methods to address these limitations, while it is still a challenging issue to strike a balance between segmentation accuracy and label efficiency. In this thesis, we propose a series of approaches to improve the label and computation efficiency of model training while maintaining high segmentation accuracy.	en_US
dcterms.abstract	In Chapter 1, we review some popular lable and computation-efficient methods for deep 2D/3D segmentation, and discuss contribution and organization of this thesis. In Chapter 2, we focus on transferring the model trained on synthetic source domain to real target domain. To alleviate the domain shift between source and target domains, we propose a class-balanced pixel-level self-labeling mechanism, which simultaneously clusters pixels and rectiﬁes pseudo labels with the obtained cluster assignments. In Chapter 3, we focus on instance segmentation with box annotations as supervision. We develop a Semantic-aware Instance Mask (SIM) generation paradigm. Instead of heavily relying on local pair-wise affinities among neighboring pixels, we construct a group of category-wise feature centroids as prototypes to identify foreground objects and assign them semantic-level pseudo labels. In Chapter 4, we further improve computation efficiency of existing instance segmentation model. In order to alleviate the increase of computation and memory costs caused by using large masks, we develop a Mask Switch Module (MSM) with negligible computational cost to select the most suitable mask resolution for each instance, achieving high efficiency while maintaining high segmentation accuracy. Finally, in Chapter 5, we study the application of label-efficient segmentation algorithms to open-vocabulary 3D scene understanding. We leverage large vision-language models to extract scene descriptions and category information to build the text modality as supervision. Then we co-embed different modalities into a common space for maximizing their synergistic beneﬁts.	en_US
dcterms.abstract	The proposed methods in this thesis signiﬁcantly improve the label and computation efficiency of segmentation while maintaining high accuracy levels. The experimental results demonstrate their superiority to state-of-the-art segmentation methods. Our research provides a promising direction for future research in deep learning-based segmentation applications with limited annotations and computational resources.	en_US
dcterms.extent	xvii, 138 pages : color illustrations	en_US
dcterms.isPartOf	PolyU Electronic Theses	en_US
dcterms.issued	2024	en_US
dcterms.educationalLevel	Ph.D.	en_US
dcterms.educationalLevel	All Doctorate	en_US
dcterms.LCSH	Computer vision	en_US
dcterms.LCSH	Machine learning	en_US
dcterms.LCSH	Image processing -- Digital techniques	en_US
dcterms.LCSH	Hong Kong Polytechnic University -- Dissertations	en_US
dcterms.accessRights	open access	en_US

Files in This Item:

File	Description	Size	Format
7654.pdf	For All Users	49.36 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/13202