Author: Shi, Siping
Title: Privacy-preserving and robust federated learning on noisy environment
Advisors: Wang, Dan (COMP)
Degree: Ph.D.
Year: 2023
Subject: Machine learning
Federated database systems
Machine learning -- Security measures
Hong Kong Polytechnic University -- Dissertations
Department: Department of Computing
Pages: xxii, 151 pages : color illustrations
Language: English
Abstract: Federated learning is growing to be an important paradigm in many data-intensive applications, ranging from prevalent personal mobile apps and privacy-intensive verticals, e.g., health, and finance. In federated learning, a large number of devices can learn a global model collaboratively without directly exposing their private local data. Specifically, each device trains a local model with its own data, and only the local model is sent to the server for model aggregation. Though the distributed nature and data constraints of federated learning provide privacy protection for each device, it makes federated learning more vulnerable to the noisy environment, which can be classified into three categories: noisy models, noisy data, and noisy labels. The vanilla federated learning is not robust to all these noises, and the performance of federated learning is significantly affected, including the learning process and the learned global model accuracy. Therefore, it is necessary to design robust federated learning algorithms to defend the noisy environment. In this thesis, we focus on improving the robustness of federated learning on noisy environment while preserving privacy and make the following original contributions.
Firstly, we study how to privately improve the robustness of federated learning when noisy models exist. The noisy model is caused by local model poisoning attacks, in which the attacker manipulates the shared local models during the process of federated learning. In contrast to existing defense methods which are passive in the sense that they try to mitigate the negative impact of the poisoned local models instead of eliminating them, we leverage the new federated analytics paradigm to develop a proactive defense method. More specifically, federated analytics is to collectively carry out analytics tasks without disclosing local data of the edge devices.
We propose a Federated Anomaly Analytics enhanced Federated Learning (FAA-FL) framework, where the clients and the server collaboratively analyze the anomalies. FAA-FL first detects all the uploaded local models and splits out the potential malicious ones. Then, it verifies each potential malicious local model with functional encryption. Finally, it removes the verified anomalies and aggregates the remaining to produce the global model. We comprehensively analyze the proposed FAA-FL framework and show that it is accurate, robust, and efficient.
Secondly, we focus on optimizing the robustness of federated learning when differential privacy noise is added to the data for privacy guarantee. Local differential privacy (LDP) is a prominent approach and is widely adopted in federated learning to preserve the privacy of local training data. It also nicely provides a rigorous privacy guarantee with computational efficiency in theory. However, a strong privacy guarantee with local differential privacy can degrade the robustness of the learned global model. To date, very few studies focus on the interplay between LDP and the robustness of federated learning. We observe that LDP adds random noise to the data to achieve a privacy guarantee of local data, and thus introduces uncertainty to the training dataset of federated learning. This leads to decreased robustness. To solve this robustness problem caused by uncertainty, we propose to leverage the promising distributionally robust optimization (DRO) modeling approach. Specifically, we first formulate a distributionally robust and private federated learning problem (DRPri). While our formulation successfully captures the uncertainty generated by the LDP, we show that it is not easily tractable. We thus transform our DRPri problem to another equivalent problem, under the Wasserstein distance-based uncertainty set, which is named the DRPri-W problem. We then design a robust and private federated learning algorithm, RPFL, to solve the DRPri-W problem. We analyze RPFL and theoretically show it satisfies differential privacy with a robustness guarantee.
Finally, we explore the noisy labels issue of federated learning in the network traffic classification domain. Network traffic classifiers of mobile devices are widely learned with federated learning for privacy preservation. Noisy labels commonly occur in each device and deteriorate the accuracy of the learned network traffic classifier.
Existing noise elimination approaches attempt to solve this by detecting and removing noisy labeled data before training. However, they may lead to poor performance of the learned classifier, as the remaining traffic data in each device is few after noise removal. Motivated by the observation that the data feature of the noisy labeled traffic data is clean and the underlying true distribution of the noisy labeled data is statistically close to the clean traffic data, we propose to utilize the noisy labeled data by normalizing it to be close to the clean traffic data distribution. Specifically, we first formulate a distributionally robust federated network traffic classifier learning problem (DR-NTC) to jointly take the normalized traffic data and clean data into training. Then we specify the normalization function under Wasserstein distance to transform the noisy labeled traffic data into a certified robust region around the clean data distribution, and we reformulate the DR-NTC problem into an equivalent DR-NTC-W problem. Finally, we design a robust federated network traffic classifier learning algorithm, RFNTC, to solve the DR-NTC-W problem. We also provide a theoretical analysis of the proposed algorithm to show the robustness guarantee.
In summary, we propose three methods to overcome the challenges brought by the noisy environment in federated learning with privacy preservation. All these methods are implemented and evaluated with various models and real-world datasets, and the experimental results present the effectiveness of the proposed methods.
Rights: All rights reserved
Access: open access

Files in This Item:
File Description SizeFormat 
7197.pdfFor All Users23.59 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/12746