Author: | Zhang, Chuliang |
Title: | Semantic 3D occupancy prediction from 2D scenes |
Advisors: | Chau, Lap-pui (EEE) |
Degree: | M.Sc. |
Year: | 2025 |
Department: | Department of Electrical and Electronic Engineering |
Pages: | 1 volume (unpaged) : color illustrations |
Language: | English |
Abstract: | In this work, I propose a framework for predicting semantic 3D occupancy grids from 2D scenes using deep learning models. This framework leverages the information of both depth estimation and semantic segmentation to generate accurate semantic 3D representations of the environment. Specifically, the proposed method combines the Lapdepth model for depth estimation and the FastSCNN model for semantic segmentation, enabling the extraction of depth and semantic information from a single RGB image. These models work in tandem to provide per-pixel depth values and semantic labels, such as object categories and background elements. Depth information is used to reconstruct 3D point clouds, converting 2D scenes to 3D scenes using the depth values corresponding to each pixel, and then voxelizing the point cloud into a 3D mesh to generate a 3D occupancy mesh. Each voxel in the grid is annotated with semantic information read from semantic images, accompanied by corresponding labels, enhancing the understanding of the object categories and scenes occupying the 3D grid. My job will also extend the processing of single static images to dynamic video streams. My framework can process video frames in real-time, continuously perform depth estimation and semantic segmentation, and generate semantic 3D occupancy information frame by frame. In addition, I selected relatively fast depth estimation and semantic segmentation models to process video data containing a large number of frames more quickly. The experiment demonstrated the effectiveness of our method, demonstrating depth estimation and semantic segmentation, testing the semantic 3D occupancy results generated using different depth estimation methods and different semantic segmentation method to demonstrate the modularity of the framework. I also calculated the deviation in the ratio between the actual object size and the object size in the generated point cloud. And tested the performance of semantic 3D occupying scenes under different environmental influences. |
Rights: | All rights reserved |
Access: | restricted access |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
8318.pdf | For All Users (off-campus access for PolyU Staff & Students only) | 1.63 MB | Adobe PDF | View/Open |
Copyright Undertaking
As a bona fide Library user, I declare that:
- I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
- I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
- I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.
Please use this identifier to cite or link to this item:
https://theses.lib.polyu.edu.hk/handle/200/13909