| Author: | Xu, Kaishuai |
| Title: | Clinical process-aware medical dialogue systems |
| Advisors: | Li, Wenjie Maggie (COMP) |
| Degree: | Ph.D. |
| Year: | 2025 |
| Department: | Department of Computing |
| Pages: | xviii, 154 pages : color illustrations |
| Language: | English |
| Abstract: | Dialogue systems, driven by advances in Large Language Models (LLMs), enable human-like interactions across a wide range of user-facing tasks. In recent years, Medical Dialogue Systems (MDS) have gained increasing attention as demand for telemedicine has grown since the onset of the COVID-19 pandemic. These systems have expanded access to diagnostic and prescribing services via online consultations, helping to alleviate pressure on healthcare professionals. Although significant progress has been made, a substantial gap remains between current MDS and the standard of doctor-patient communication expected in clinical practice. Most existing MDS generally struggle to emulate aspects of the clinical process—such as structured clinical communication, comprehensive differential diagnosis, and prototypical clinical reasoning processes—in ways that align with clinical practice, so their behavior often diverges from clinical practice and falls short of clinician-like performance in terms of clinical accuracy, reliability, and safety. In the medical domain, a reliable dialogue system should model the clinical process and incorporate its core aspects into system design, which in turn improves safety and guideline concordance, and helps build user trust. In this thesis, we present a clinical process-aware medical dialogue system that emulates clinical practice, aiming to deliver trustworthy, clinician-like interactions. Our goal is to advance clinical process-aware dialogue systems by proposing effective, novel approaches to this widely needed but underexplored area. Specifically, we investigate three primary research questions (RQs): (1) How to model the critical information flow (e.g., medical entities and dialogue acts) in the medical dialogue and incorporate key entities and acts into response generation? (2) How to develop a comprehensive differential diagnosis and leverage it to improve the reliability and interpretability of system responses? (3) How to emulate clinicians’ internal reasoning processes and improve a system’s reasoning capabilities with external medical knowledge? To tackle these challenges, we advance the design of MDS from surface-level similarity to internal alignment with clinical practice. This thesis makes several contributions addressing these fundamental questions, presented in three parts. In the first part, we address RQ (1) and identify that extracting critical information and capturing transitions in the information flow are essential for building MDS. We propose the Dual Flow Enhanced Medical (DFMed) dialogue generation framework. At each dialogue turn, we extract medical entities and doctor dialogue acts from the dialogue history and model their transitions via an entity-centric graph flow and a sequential act flow, respectively. We further introduce an interweaving component to strengthen interactions between the two flows. Experiments show that explicitly modeling the two flows helps predict key information for the next turn, thereby improving response accuracy. The second part of the thesis focuses on RQ (2) and presents a comprehensive approach to differential diagnosis. Unlike prior approaches that rely on a generative model’s implicit diagnostic reasoning, we build a separate diagnosis module to improve diagnosis reliability and transparency. We propose IADDx, a framework that first generates a differential diagnosis and then uses the resulting candidate diseases to guide response generation. Inspired by research on diagnostic reasoning, we design a two-stage intuitive-then-analytical differential diagnosis method. In the intuitive stage, the system retrieves relevant patient cases and disease documents to construct a preliminary disease list; in the analytical stage, a multi-disease classifier refines this list. The refined candidate diseases are then used to enhance subsequent response generation. Our experiments with IADDx demonstrate that a detailed diagnostic process enhances response accuracy and interpretability. In the third part, we investigate RQ (3) and develop methods that enable clinically aligned and knowledge-augmented reasoning processes. For clinically aligned reasoning, we propose EMULATION, a framework that emulates clinicians’ internal thought processes to generate responses aligned with clinician preferences. Three modules form our framework: an abductive reasoning module and a deductive reasoning module that jointly serve as a diagnostic-analysis component to construct the differential diagnosis, and a thought alignment module that reprioritizes candidate diseases and produces CoT-style thought processes and final responses. To learn the clinician preference, we also collect a synthetic thought process dataset with the help of an LLM. For knowledge-augmented reasoning, we adopt Thought-RAG, which outlines a thought process to identify implicit knowledge needs and retrieves relevant knowledge accordingly for generation. We propose a joint learning framework, RAR², that simultaneously improves thought process generation and knowledge-augmented generation. Specifically, we first curate a training dataset comprising preference pairs across different generation objectives, then apply Direct Preference Optimization (DPO) to enhance reasoning both with and without external knowledge. Extensive experiments on several biomedical question answering datasets demonstrate RAR²’s superiority over existing RAG baselines. Besides, further test-time scaling analysis on thought process generation validates the scalability. In summary, this thesis advances clinical process-aware medical dialogue systems by integrating key aspects of clinical practice. We present models, system designs, and datasets that move systems from surface-level mimicry toward internal alignment with clinical practice. Experiments across medical dialogue and question-answering benchmarks show consistent gains over strong baselines, validating the effectiveness and scalability of our approach and pointing toward trustworthy, clinician-like interactions throughout the clinical encounter. |
| Rights: | All rights reserved |
| Access: | open access |
Copyright Undertaking
As a bona fide Library user, I declare that:
- I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
- I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
- I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.
Please use this identifier to cite or link to this item:
https://theses.lib.polyu.edu.hk/handle/200/14301

