Author: Li, Ruihuang
Title: Label and computation-efficient deep segmentation for images and point clouds
Advisors: Zhang, Lei (COMP)
Degree: Ph.D.
Year: 2024
Subject: Computer vision
Machine learning
Image processing -- Digital techniques
Hong Kong Polytechnic University -- Dissertations
Department: Department of Computing
Pages: xvii, 138 pages : color illustrations
Language: English
Abstract: Segmentation is an essential task in computer vision, which aims to divide an im­age or point cloud into several disjoint sets of pixels or points that correspond to different objects or regions. Segmentation has a wide range of applications, such as autonomous driving, robotics, augmented reality, and medical image analysis. Deep learning techniques such as convolutional neural networks (CNNs) and Transform­ers have significantly improved the accuracy of image and point cloud segmentation, while their computational complexity and requirements of a vast amount of labeled data are still bottlenecks for many real-time applications. Researchers have proposed different methods to address these limitations, while it is still a challenging issue to strike a balance between segmentation accuracy and label efficiency. In this thesis, we propose a series of approaches to improve the label and computation efficiency of model training while maintaining high segmentation accuracy.
In Chapter 1, we review some popular lable and computation-efficient methods for deep 2D/3D segmentation, and discuss contribution and organization of this thesis. In Chapter 2, we focus on transferring the model trained on synthetic source domain to real target domain. To alleviate the domain shift between source and target domains, we propose a class-balanced pixel-level self-labeling mechanism, which simultaneously clusters pixels and rectifies pseudo labels with the obtained cluster assignments. In Chapter 3, we focus on instance segmentation with box annotations as supervision. We develop a Semantic-aware Instance Mask (SIM) generation paradigm. Instead of heavily relying on local pair-wise affinities among neighboring pixels, we construct a group of category-wise feature centroids as prototypes to identify foreground objects and assign them semantic-level pseudo labels. In Chapter 4, we further improve com­putation efficiency of existing instance segmentation model. In order to alleviate the increase of computation and memory costs caused by using large masks, we develop a Mask Switch Module (MSM) with negligible computational cost to select the most suitable mask resolution for each instance, achieving high efficiency while maintain­ing high segmentation accuracy. Finally, in Chapter 5, we study the application of label-efficient segmentation algorithms to open-vocabulary 3D scene understanding. We leverage large vision-language models to extract scene descriptions and category information to build the text modality as supervision. Then we co-embed different modalities into a common space for maximizing their synergistic benefits.
The proposed methods in this thesis significantly improve the label and computation efficiency of segmentation while maintaining high accuracy levels. The experimental results demonstrate their superiority to state-of-the-art segmentation methods. Our research provides a promising direction for future research in deep learning-based segmentation applications with limited annotations and computational resources.
Rights: All rights reserved
Access: open access

Files in This Item:
File Description SizeFormat 
7654.pdfFor All Users49.36 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/13202