Full metadata record
DC FieldValueLanguage
dc.contributorDepartment of Mechanical Engineeringen_US
dc.contributor.advisorSun, Yuxiang (ME)en_US
dc.creatorFeng, Zhen-
dc.identifier.urihttps://theses.lib.polyu.edu.hk/handle/200/12834-
dc.languageEnglishen_US
dc.publisherHong Kong Polytechnic Universityen_US
dc.rightsAll rights reserveden_US
dc.titleA study on semantic scene understanding with multi-modal fusion for autonomous drivingen_US
dcterms.abstractTraffic scene understanding is the basis for the safe driving of autonomous vehicles. Semantic segmentation is able to distinguish the class of each pixel in an image, which makes it known as one of the important methods for traffic scene understanding. Due to the complexity and variability of traffic scenes, single-modal data often cannot meet the needs of all scenes. Semantic segmentation algorithms with multi-modal fusion can address the problem that single-modal data is affected by environmental noise leading to performance degradation. Currently, traffic scene understanding based on multi-modal fusion has received increasing attention, such as the fusion of Red-Green-Blue (RGB) images with thermal images and the fusion of RGB images with depth images. The aim of this study is to investigate the segmentation of negative obstacles in traffic scenes and the segmentation of all-day traffic scenes by fusing multi-modal data.en_US
dcterms.abstractAlthough current multi-modal fusion networks for negative obstacle segmentation have achieved acceptable results, their encoders only use one structure to extract one kind of feature, such as local features. Due to the limitation of the receptive field, the local features extracted by a convolutional network cannot fully represent the global information in the image, while the global information extracted by the self-attention module cannot focus on the local detail information as much as the local features. To address this issue, we propose Multi-modal Attention Fusion Network named MAFNet for the segmentation of road potholes with the fusion of RGB images and disparity images. Specifically, we combine a convolutional network and transformer network as an encoder to extract features from images. In addition, we design fusion modules based on attention modules to fuse the features of RGB images and disparity images. Experiments illustrate that our proposed MAFNet network achieves better results than existing state-of-the-art networks.en_US
dcterms.abstractLarge-scale datasets are necessary for training high-quality networks. To address the scarcity of datasets for negative obstacle segmentation with multi-modal fusion, we build and release a dataset for the segmentation of negative obstacles with RGB images and depth images. To reduce the workload of manual labeling, we manually labelled 745 images and generate coarse labels for the remaining 3000 images using the exist­ing dataset and the labelled images. Currently, multi-modal fusion networks have the disadvantage of slow inference when dealing with large-size input data. To address this issue, we propose Channel and Position-wise Knowledge Distillation (CPKD) framework. Specifically, we replace the heavyweight encoder of the teacher network with a lightweight network while introducing a downsampling layer into the beginning of the student network to reduce the amount of data. We design Channel and Position-wise Distillation (CPD) modules to transfer knowledge from the teacher network to the stu­dent network. The experimental results illustrate that our proposed CPKD framework can greatly improve the inference speed of the network and enable the student network to achieve satisfactory performance.en_US
dcterms.abstractTo address the effect of the blurred edge of thermal images and the issue that the performance of RGB-thermal image fusion networks is easily affected by the alignment relationship changing, we proposed Cross-modal Edge-privileged Knowledge Distilla­tion (CEKD) framework for segmentation. This framework transfers the capability of edge detection from the multi-modal teacher network to the thermal-image student net­work by knowledge distillation. The main aim of the CEKD framework is to improve the segmentation accuracy of the student network. We introduce an edge detection mod­ule into the teacher network and introduce the edge labels as privileged information to train the teacher network. We also design a Thermal Enhancement (TE) module for the student network to improve the contrast between the high-temperature objects and the low-temperature background. The experimental results illustrate that the thermal-only student network trained by our designed CEKD framework is able to learn edge detection capability from the teacher network. The experimental results also illustrate that our student network achieves better performance than the single-modal network for the segmentation of traffic scenes with only thermal images.en_US
dcterms.extentxx, 114 pages : color illustrationsen_US
dcterms.isPartOfPolyU Electronic Thesesen_US
dcterms.issued2024en_US
dcterms.educationalLevelPh.D.en_US
dcterms.educationalLevelAll Doctorateen_US
dcterms.LCSHImage segmentationen_US
dcterms.LCSHImage analysisen_US
dcterms.LCSHImage processing -- Digital techniquesen_US
dcterms.LCSHAutomated vehiclesen_US
dcterms.LCSHHong Kong Polytechnic University -- Dissertationsen_US
dcterms.accessRightsopen accessen_US

Files in This Item:
File Description SizeFormat 
7284.pdfFor All Users9.26 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/12834