Causally motivated collaborative learning across heterogeneous data distributions

Tang, Xueyang

Author:	Tang, Xueyang
Title:	Causally motivated collaborative learning across heterogeneous data distributions
Advisors:	Guo, Song (COMP)
Degree:	Ph.D.
Year:	2025
Subject:	Machine learning Machine learning Hong Kong Polytechnic University -- Dissertations
Department:	Department of Computing
Pages:	xviii, 160 pages : color illustrations
Language:	English
Abstract:	As data privacy is attached more and more importance to in the field of machine learning, federated learning (FL) gains increasing attention and dramatic development in recent years. Federated learning allows the participation of a massive number of data holders (i.e., clients) that possess limited data to collaboratively train learning models in a privacy-preserving manner with raw data preserved locally. Traditional FL approaches develop a shared global model by the periodical model aggregation to fit all the local datasets, which can work well when the local data instances among different clients are independent and identically distributed (IID). The performance of the produced model can be significantly degraded if the data distributions across participants are heterogeneous (i.e., if a data distribution shift exists among local clients). On the one hand, the distribution shift across local training datasets can result in negative knowledge transfer between distant clients. On the other hand, the presence of distribution shift between training and test datasets can render the trained model incapable of generalizing effectively to unseen test data on each client. These challenges greatly impede the applicability of federated learning in practical scenarios. To address the challenge of data distribution shift in heterogeneous FL, we propose innovative frameworks for personalized federated learning in this thesis. First, the prevalent personalized federated learning (PFL) can handle the distribution shift across local training datasets through building a personalized model for each client with the guidance of a shared global model. However, the sole global model may easily transfer deviated context knowledge to some local models when multiple latent contexts exist across local datasets. We propose a concept called contextualized generalization (CG) to provide each client with fine-grained context knowledge that can better fit the local data distributions and facilitate faster model convergence. Theoretical analysis on convergence rate and generalization error shows our method CGPFL grants a O(√ K) speedup over most existing methods and achieves a better personalization-generalization trade-off than existing solutions. Moreover, our theoretical analysis further inspires a heuristic algorithm to find a near-optimal trade-off in CGPFL. Second, modern machine learning model prefers to rely on shortcut which can perform well at training stage but fail to generalize to the unseen test data that presents distribution shift with regard to training data. The limited data diversity on federated clients can exacerbate this issue, making mitigating shortcut and meanwhile preserving personalization knowledge rather difficult. We formulate the structural causal models (SCMs) for heterogeneous federated clients, and derive two significant causal signatures which inspire a provable shortcut discovery and removal method. The proposed FedSDR is divided into two steps: 1) utilizing the available training data distributed among local clients to discover all the shortcut features in a collaborative manner. 2) developing the optimal personalized causally invariant predictor for each client by eliminating the discovered shortcut features. We provide theoretical analysis to prove that our method can draw complete shortcut features and produce the optimal personalized invariant predictor that can generalize to unseen test data on each client. Third, while the preceding research makes a primary endeavor to address the challenge of train-test distribution shift, it exhibits two notable limitations: 1) FedSDR can offer theoretical guarantees solely within linear feature spaces; 2) the server necessitates access to local environmental knowledge in FedSDR. To mitigate these two limitations, we propose a crucial causal signature which can distinguish personalized features from spurious features with global invariant features as the anchor. The novel causal signature is quantified as an information-theoretic constraint that facilitates the shortcut-averse personalized invariant learning on each client. Theoretical analysis demonstrates the novel method, FedPIN, can yield a tighter bound on generalization error than the prevalent PFL approaches when train-test distribution shift exists on clients. Moreover, we provide a theoretical guarantee on the convergence rate of the proposed FedPIN. In summary, we address the data distribution shift in heterogeneous federated learning by proposing three novel PFL methods. The experimental results on diverse settings demonstrate the effectiveness of the proposed methods compared to the existing PFL approaches. Given that data distribution shift is prevalent in practical federated learning scenarios, our methods can not only contribute to the academic community of federated learning but also facilitate the deployment of federated learning in real-world applications.
Rights:	All rights reserved
Access:	open access

Files in This Item:

File	Description	Size	Format
8454.pdf	For All Users	13.33 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/13993