Author: Wang, Keze
Title: Self-driven learning for large-scale object detection
Advisors: Zhang, Lei (COMP)
Degree: Ph.D.
Year: 2019
Subject: Hong Kong Polytechnic University -- Dissertations
Image processing
Computer vision
Machine learning
Department: Department of Computing
Pages: xvi, 119 pages : color illustrations
Language: English
Abstract: Aiming at finding instances of real-world objects from images or video sequences, object detection has been attracting great interests in the research community of computer vision. Benefiting from the rapid advancement of deep convolutional neural networks (CNNs), remarkable progress has been achieved in object detection. Currently, most efforts have been spent on the design of powerful network architectures, e.g., resdual networks (ResNet) [38] and single shot multi-box detectors (SSD) [72], to improve feature learning and computation speed. However, existing object detection methods require massive data collection and annotation, which is quite expensive. Hence, how to leverage large-scale unlabeled data to improve detection performance is a crucial and long-standing problem in object detection. To address this issue, many active learning (AL) methods have been proposed, which retrieve a small amount of representative unlabeled samples for manual annotation. However, these AL methods ignore the remaining majority samples (i.e., those with low uncertainty or high prediction confidence). In this thesis, we aim to develop cost-effective method to mine samples from both majority and minority unlabeled samples, minimizing user annotation efforts to train more powerful object detectors. First, we naturally combine AL and self-paced learning (SPL) [57] to automatically pseudo-label the majority of high confidence samples and incorporate them into training with the weak expert re-certification strategy. Such an implementation can be formulated as solving a concise active SPL optimization problem, which advances the SPL development by supplementing it a rational dynamic curriculum constraint. The required number of annotated samples is significantly decreased without sacrificing the performance. A dramatic reduction of user effort is also achieved over other state-of-the-art AL techniques. In addition, the mixture of SPL and AL improves not only the classifier accuracy but also the robustness against noisy data.
Second, we present a principled self-supervised sample mining (SSM) scheme to account for the real challenges in object detection. Specifically, our SSM scheme concentrates on automatically discovering and pseudo-labeling reliable region proposals to enhance the object detector via cross image validation, i.e., pasting these proposals into different labeled images to comprehensively measure their scores under different image contexts. By resorting to SSM, we propose a new AL framework to gradually incorporate unlabeled or partially labeled data into the model learning while minimizing the annotation effort of users. Third, we develop a principled active sample mining (ASM) framework, which involves a selectively switchable sample selection mechanism to determine whether an unlabeled sample should be manually annotated via AL or automatically pseudo-labeled via a novel self-learning process. The proposed process is compatible with mini-batch based training (i.e., using a batch of unlabeled or partially labeled data as one-time input). Notably, our method is suitable to detect object categories that are not seen in the unlabeled data during the learning process. Lastly, we develop a novel memory network module named convolutional memory block (CMB), which empowers CNNs with the memory mechanism to enhance the pattern abstracting capability by reusing their rich implicit convolutional structures and spatial correlations among the non-sequential training samples. Specifically, the proposed CMB consists of one internal memory (i.e., a set of feature maps) and three specific controllers, which enable a powerful yet efficient memory manipulation mechanism. Our proposed CMB intends to capture and store the representative dependencies or correlations among training samples according to specific learning tasks, and further employ these stored dependencies to enhance the representation of convolutional layers. In this way, our CMB encourages the CNN architecture to be lightweight and require less training data. In summary, in this thesis we focus on exploiting large-scale unlabeled or partially labeled data incrementally to improve object detection performance. Extensive experiments on public benchmarks clearly demonstrate that our proposed approaches can achieve comparable performance to alternative methods but with significantly fewer annotations.
Rights: All rights reserved
Access: open access

Files in This Item:
File Description SizeFormat 
991022208058303411.pdfFor All Users5.19 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/9934