Full metadata record
DC FieldValueLanguage
dc.contributorDepartment of Computingen_US
dc.contributor.advisorZhang, Lei (COMP)en_US
dc.creatorLi, Ruihuang-
dc.identifier.urihttps://theses.lib.polyu.edu.hk/handle/200/13202-
dc.languageEnglishen_US
dc.publisherHong Kong Polytechnic Universityen_US
dc.rightsAll rights reserveden_US
dc.titleLabel and computation-efficient deep segmentation for images and point cloudsen_US
dcterms.abstractSegmentation is an essential task in computer vision, which aims to divide an im­age or point cloud into several disjoint sets of pixels or points that correspond to different objects or regions. Segmentation has a wide range of applications, such as autonomous driving, robotics, augmented reality, and medical image analysis. Deep learning techniques such as convolutional neural networks (CNNs) and Transform­ers have significantly improved the accuracy of image and point cloud segmentation, while their computational complexity and requirements of a vast amount of labeled data are still bottlenecks for many real-time applications. Researchers have proposed different methods to address these limitations, while it is still a challenging issue to strike a balance between segmentation accuracy and label efficiency. In this thesis, we propose a series of approaches to improve the label and computation efficiency of model training while maintaining high segmentation accuracy.en_US
dcterms.abstractIn Chapter 1, we review some popular lable and computation-efficient methods for deep 2D/3D segmentation, and discuss contribution and organization of this thesis. In Chapter 2, we focus on transferring the model trained on synthetic source domain to real target domain. To alleviate the domain shift between source and target domains, we propose a class-balanced pixel-level self-labeling mechanism, which simultaneously clusters pixels and rectifies pseudo labels with the obtained cluster assignments. In Chapter 3, we focus on instance segmentation with box annotations as supervision. We develop a Semantic-aware Instance Mask (SIM) generation paradigm. Instead of heavily relying on local pair-wise affinities among neighboring pixels, we construct a group of category-wise feature centroids as prototypes to identify foreground objects and assign them semantic-level pseudo labels. In Chapter 4, we further improve com­putation efficiency of existing instance segmentation model. In order to alleviate the increase of computation and memory costs caused by using large masks, we develop a Mask Switch Module (MSM) with negligible computational cost to select the most suitable mask resolution for each instance, achieving high efficiency while maintain­ing high segmentation accuracy. Finally, in Chapter 5, we study the application of label-efficient segmentation algorithms to open-vocabulary 3D scene understanding. We leverage large vision-language models to extract scene descriptions and category information to build the text modality as supervision. Then we co-embed different modalities into a common space for maximizing their synergistic benefits.en_US
dcterms.abstractThe proposed methods in this thesis significantly improve the label and computation efficiency of segmentation while maintaining high accuracy levels. The experimental results demonstrate their superiority to state-of-the-art segmentation methods. Our research provides a promising direction for future research in deep learning-based segmentation applications with limited annotations and computational resources.en_US
dcterms.extentxvii, 138 pages : color illustrationsen_US
dcterms.isPartOfPolyU Electronic Thesesen_US
dcterms.issued2024en_US
dcterms.educationalLevelPh.D.en_US
dcterms.educationalLevelAll Doctorateen_US
dcterms.LCSHComputer visionen_US
dcterms.LCSHMachine learningen_US
dcterms.LCSHImage processing -- Digital techniquesen_US
dcterms.LCSHHong Kong Polytechnic University -- Dissertationsen_US
dcterms.accessRightsopen accessen_US

Files in This Item:
File Description SizeFormat 
7654.pdfFor All Users49.36 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/13202