Author: | Feng, Yuchao |
Title: | A study on explainable end-to-end autonomous driving |
Advisors: | Sun, Yuxiang (ME) Chu, Henry (ME) |
Degree: | Ph.D. |
Year: | 2025 |
Department: | Department of Mechanical Engineering |
Pages: | xix, 128 pages : color illustrations |
Language: | English |
Abstract: | In recent years, end-to-end networks have emerged as a promising approach to achieving advanced autonomous driving in self-driving vehicles. Unlike modular pipelines, which divide autonomous driving into separate modules, this approach learns to drive by directly mapping raw sensory data to driving decisions (or control outputs). Compared to modular systems, end-to-end networks can avoid the accumulation of errors across different modules and are more scalable to complex scenarios. Despite these advantages, a major limitation of this approach is its lack of explainability. The outputs of end-to-end networks are generally not interpretable, making it difficult to understand why a specific input produces a given output. This limitation raises significant concerns about the safety and reliability of such systems, hindering their broader application and acceptance in real-world traffic environments. Within this context, this study develops three methods to enhance the explainability of end-to-end autonomous driving networks. First, natural-language explanations are proposed to improve explainability. A novel explainable network, named the Natural-Language Explanation for Decision Making (NLE-DM), is designed to jointly predict driving decisions and natural-language explanations. While natural-language explanations serve as an effective way to explain driving decisions, they often fall short of revealing the internal processes of the network. In contrast, visual explanations can provide insights into the network's inner workings. Therefore, to further enhance explainability, we propose combining natural-language and visual explanations as a multimodal approach. An explainable end-to-end network, named Multimodal Explainable Autonomous Driving (Multimodal-XAD), is designed to jointly predict driving decisions and multimodal environment descriptions. Finally, we revisit the concept of visual explanations and introduce an innovative Bird's-Eye-View (BEV) perception method, named PolarPoint-BEV. This method leverages a polar coordinate-based approach to better illustrate how the network perceives spatial relationships in the driving environment. The three methods proposed in this study not only enhance the explainability of end-to-end networks but also address distinct key scientific problems in autonomous driving. For NLE-DM, the effect of natural-language explanations on driving decision prediction performance is investigated. The results demonstrate that the existence of natural-language explanations improves the accuracy of driving decision predictions. For Multimodal-XAD, the issue of error accumulation in downstream tasks of vision-based BEV perception is addressed by incorporating both context and local information before predicting driving decisions and environment descriptions. Experimental results show that combining context and local information enhances the prediction performance of both tasks. For PolarPoint-BEV, the limitations of traditional BEV maps are identified and effectively addressed. Specifically, traditional BEV maps treat all regions equally, risking oversight of critical safety details, and use dense grids, resulting in high computational costs. To overcome these limitations, PolarPoint-BEV prioritizes regions closer to the ego vehicle, ensuring greater attention is given to critical areas while providing a more lightweight representation due to its sparse structure. To evaluate the impact of PolarPoint-BEV on explainability and driving performance, a multi-task end-to-end driving network, XPlan, is proposed to jointly predict control commands and polar point BEV maps. |
Rights: | All rights reserved |
Access: | open access |
Copyright Undertaking
As a bona fide Library user, I declare that:
- I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
- I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
- I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.
Please use this identifier to cite or link to this item:
https://theses.lib.polyu.edu.hk/handle/200/13711