Flexible modality integration for real-world medical AI : handling structural distinction, heterogeneity, and asynchronicity in multimodal healthcare data

Feng, Yidan

Full metadata record

DC Field	Value	Language
dc.contributor	School of Nursing	en_US
dc.contributor.advisor	Qin, Jing (SN)	en_US
dc.creator	Feng, Yidan	-
dc.identifier.uri	https://theses.lib.polyu.edu.hk/handle/200/14231	-
dc.language	English	en_US
dc.publisher	Hong Kong Polytechnic University	en_US
dc.rights	All rights reserved	en_US
dc.title	Flexible modality integration for real-world medical AI : handling structural distinction, heterogeneity, and asynchronicity in multimodal healthcare data	en_US
dcterms.abstract	Medical artificial intelligence demands robust multimodal fusion to integrate diverse data streams, including anatomical/functional imaging, heterogeneous clinical variables, and irregular longitudinal measurements, for comprehensive clinical insights. However, existing multimodal fusion methods typically assume fixed modality availability, severely limiting their real-world applicability in dynamic clinical environments characterized by institutional resource disparities, patient-specific contraindications, evolving diagnostic workflows, and temporal irregularities in data acquisition. Consequently, flexible modality integration is indispensable for clinical translation, yet significant technical challenges persist: 1) reliance on complete modality sets during training severely limits data utilization and generalization to partial inputs; 2) existing inter-modal alignment strategies inadequately preserve task-specific unique semantics while adapting to dynamically changing inputs; 3) architectural inflexibility hinders scalable integration of novel modalities; and 4) effective modeling of asynchronous temporal-modality dependencies remains critically underexplored. This thesis addresses these core challenges by developing a set of solutions for clinically adaptive multimodal learning, enabling robust integration of arbitrary modality subsets across diverse medical scenarios. Three clinically representative applications were selected to validate our approach across the multimodal integration spectrum:	en_US
dcterms.abstract	1. Multimodal MRI synthesis: A typical dense prediction task where complementary sequences are fundamental for soft-tissue characterization yet frequently compromised by variable acquisition success in clinical practice. A unified method is proposed to reconcile the artificial fragmentation between cross-modality synthesis (CMS) and multi-contrast super-resolution (MCSR) through fine-grained difference learning. Spatial misalignments inherent in clinical scans are resolved via multi-scale deformable convolutions, while modality-specific structures distinction is recovered through a synergistic mechanism comprising: a difference projection discriminator, distinction-aware feature regularization, and incremental feature modulation. This approach achieves consistent high-fidelity reconstruction across extreme degradation levels (2–16× undersampling), significantly outperforming task-specific alternatives.	en_US
dcterms.abstract	2. Alzheimer's diagnosis with heterogeneous modalities: A prevalent clinical condition requiring diagnostic synthesis of diverse and inherently imbalanced multimodal data. The proposed AnyMod architecture addresses combinatorial missing-modality complexity and semantic heterogeneity by enabling training and inference on arbitrary combinations of imaging and non-imaging data. Its core innovations include representation-task decoupled alignment—preserving modality-unique semantics while mapping heterogeneous inputs to class-invariant prototypes, along with modality-agnostic Transformer projectors that eliminate dedicated encoders, and dynamic token clustering ensuring computational scalability across modality combinations. Validation demonstrates increasing performance advantages over combination-specific models as modality count grows, with seamless extensibility to unseen modalities.	en_US
dcterms.abstract	3. Dynamic (Acute Respiratory Distress Syndrome) ARDS risk monitoring with asynchronous modalities: A critical adverse event in ICU demanding continuous risk assessment from inherently asynchronous data streams (sparse CXRs, high-frequency vitals, intermittent labs). Effective integration of these irregularly sampled modalities is achieved through modality-wise encoding with adaptive positional encodings that preserve temporal-semantic relationships. The framework incorporates a Staged Temporal-Modal Fusion module decoupling cross-modal interaction from temporal processing, complemented by Progressive Context Memory enabling computationally efficient long-range dependency modeling. The framework provides hourly risk stratification with time-to-onset quantification (AUROC 0.94 <6h pre-onset), revealing 20-fold ARDS incidence in high-risk cohorts.	en_US
dcterms.abstract	All methods are validated on publicly-available datasets, demonstrating performance gains over state-of-the-art techniques. By systematically addressing clinical and technical barriers, including data inefficiency, semantic heterogeneity, architectural rigidity, and temporal irregularities, this work advances multimodal learning toward clinically adaptive, data-efficient, and equitable AI-driven healthcare.	en_US
dcterms.extent	xvi, 138 pages : color illustrations	en_US
dcterms.isPartOf	PolyU Electronic Theses	en_US
dcterms.issued	2025	en_US
dcterms.educationalLevel	Ph.D.	en_US
dcterms.educationalLevel	All Doctorate	en_US
dcterms.accessRights	open access	en_US

Files in This Item:

File	Description	Size	Format
8685.pdf	For All Users	3.52 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/14231