Author: | Li, Chu Tak |
Title: | Learning approaches for scene localization and quality scene reconstruction |
Advisors: | Siu, Wan-chi (EIE) Lun, Pak Kong Daniel (EIE) |
Degree: | M.Phil. |
Year: | 2023 |
Subject: | Automated vehicles -- Data processing Automated vehicles -- Control Machine learning Hong Kong Polytechnic University -- Dissertations |
Department: | Department of Electronic and Information Engineering |
Pages: | xiv, 141 pages : color illustrations |
Language: | English |
Abstract: | Vision-based autonomous driving techniques are popular in both academia and industry because of the highly cost-effective commodity cameras with high quality output images and the information richness of images. Global Navigation Satellite Systems (GNSS) is well-known for many real-world ego-localization and other related applications. However, GNSS suffers from reflection and blocking due to dense concrete buildings and tall trees, especially in the densely populated urban areas, like Hong Kong. There are also other solutions using high-level sensors like Lidar, Radar and 360 RGB-D cameras. Nevertheless, these solutions still have their respective limitations and are not widely used in various commercial products. Therefore, various technologies including visual place recognition and reconstruction methods discussed in this thesis will be required for achieving a comprehensive autonomous driving system. Place recognition or localization is an important element to autonomous driving system. Accurate ego location information is crucial for either removing past accumulated errors or future planning. The challenges lie in the variations in appearance, speeds, lighting environments, perspectives and objects. Therefore, we develop a fast algorithm for place recognition, for which fast tracking with the use of historical information and effective representation of a frame have been comprehensively studied to achieve satisfactory recognition performance and minimize computational cost. We name the use of historical information as a tubing strategy which emphasizes the temporal correlation between consecutive input frames. We take the advantages of recent deep learning techniques; also remove two main barriers of Convolutional Neural Networks (CNNs) , i.e., heavy computational cost and large amount of labelled data, such that deep learning techniques can be used for efficient place recognition methods. We study lightweight CNN models to offer efficient feature extraction and improve an existing automatic training data generation module by considering more variations in conditions. We further propose a way to adaptively use the historical information to tackle the tasks of unknown initial location and efficient recognition. The proposed methods outperform other state-of-the-art methods in terms of both recognition performance and complexity. To ensure the quality of the extracted features from images, we also study object removal by means of deep learning-based image inpainting for scene reconstruction. By removing unwanted objects like moving vehicles and pedestrians in images, we can have clean images for place recognition. We propose Deep Generative Inpainting Network (DeepGIN) and inpainting model with Multi-Dilation Fusion Block (MDFB) and auxiliary attention learning branch which seek for a better balance of pixel-wise accuracy and visual quality. We show that our proposed models can handle wild images by testing them on several publicly available datasets, Flickr-Faces-HQ (FFHQ), The Oxford Buildings and Places2 datasets. We demonstrate that our inpainting results can be used in other high-level computer vision tasks such as face verification and semantic segmentation. We believe that the inpainting results can also be used in place recognition. For future research work, we target at developing a more comprehensive recognition system for which our inpainting models are used as pre-processing module to obtain better input images and our tubing strategy is applied to the post-processing stage to obtain better recognition performance. Apart from combining the techniques discussed in this thesis, we would like to develop an online learning strategy to keep the understanding of a path up to date for further enhancing life-long recognition performance. |
Rights: | All rights reserved |
Access: | open access |
Copyright Undertaking
As a bona fide Library user, I declare that:
- I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
- I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
- I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.
Please use this identifier to cite or link to this item:
https://theses.lib.polyu.edu.hk/handle/200/12268