Author: | Zhang, Cong |
Title: | Deep learning for object detection in remote sensing imagery |
Advisors: | Lam, Kin-man Kenneth (EEE) Chan, Yui-lam (EEE) |
Degree: | Ph.D. |
Year: | 2024 |
Subject: | Remote sensing Image processing -- Digital techniques Machine learning Hong Kong Polytechnic University -- Dissertations |
Department: | Department of Electrical and Electronic Engineering |
Pages: | xxxvi, 196 pages : color illustrations |
Language: | English |
Abstract: | Object detection in remote sensing (RS) imagery is a fundamental task in the communities of computer vision and Earth observation. The aim is to identify all geospatial object instances of specific categories in a given RS image. Due to its wide range of real- world applications, this task has been extensively studied in recent years. Moreover, with the rapid development of deep learning techniques, promising improvements have been achieved in RS object detection. However, existing RS object detectors still struggle with unsatisfactory accuracy in complex scenarios and lower model efficiency. There are some challenging issues limiting their potential and practicality, including inefficient detection frameworks and sub-optimal training paradigms. To this end, this thesis primarily focuses on developing deep learning-based methods to address these challenges in the field of RS object detection, from two perspectives: more advanced and efficient detection frameworks and more tailored and robust training paradigms. Firstly, single-stage detection frameworks have gained increasing attention due to their higher model efficiency than two-stage frameworks. However, they often suffer from degraded accuracy, especially in challenging RS scenarios. This degradation is caused by the coarseness in two aspects, i.e., coarse features and coarse training samples. This thesis deeply investigates such coarseness and proposes a novel progressive coarse-to-fine single-stage framework, namely CoF-Net. Composed of two parallel branches, CoF-Net facilitates the refinement of coarse features and coarse samples for improved accuracy, while maintaining high model flexibility as a single-stage detection framework. Secondly, vision Transformers have demonstrated more powerful representational capabilities than convolutional neural networks (CNNs) across various vision tasks. However, significant obstacles have limited the performance and adaptation of Transformers to the context of RS object detection, such as high computational complexity, lack of inductive knowledge, and difficulty in learning arbitrary orientation. This thesis explores an efficient and inductive Transformer framework with angle tokenization, namely EIA-Transformer, to overcome these obstacles. By reducing the inherent spatial redundancy in RS images and encoding appropriate inductive bias through local convolutions, EIA-Transformer achieves significantly higher accuracy than CNN-based detectors at a lower computational cost. It has made remarkable contribution to the development of Transformer-based detection frameworks in RS scenarios. Thirdly, several critical conflicts exist between current pretraining and fine-tuning paradigms for RS object detect, including domain inconsistency, task objective mismatch, and architecture misalignment. These conflicts lead to degradation in the final detection performance. To address these issues, this thesis proposes a novel pretraining paradigm specifically for RS object detection, namely strong-classification weak-localization (SCWL) pretraining. Without introducing any additional detection components, SCWL pretraining aims to perform explicit instance-level pretraining for the whole detection frameworks in the RS domain. Furthermore, it can consistently yield remarkable performance improvements for different RS object detectors across a wide variety of settings. Fourthly, recent research has focused on enhancing the robustness of deep learning-based RS object detectors against adversarial attacks. However, conventional adversarial training typically improves adversarial robustness at the expense of clean accuracy, also imposes heavy computational and memory burdens during fine-tuning, which is unfavorable in real-world RS applications. This thesis investigates a novel perspective to remedy these problems, namely structured adversarial self-supervised (SASS) pretraining. SASS pretraining hierarchically incorporates structured knowledge into pretraining, thereby simultaneously benefiting clean accuracy and adversarial robustness of RS object detectors. |
Rights: | All rights reserved |
Access: | open access |
Copyright Undertaking
As a bona fide Library user, I declare that:
- I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
- I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
- I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.
Please use this identifier to cite or link to this item:
https://theses.lib.polyu.edu.hk/handle/200/13281