Full metadata record
DC FieldValueLanguage
dc.contributorDepartment of Electrical and Electronic Engineeringen_US
dc.contributor.advisorHu, Haibo (EEE)en_US
dc.creatorDuan, Jiawei-
dc.identifier.urihttps://theses.lib.polyu.edu.hk/handle/200/13699-
dc.languageEnglishen_US
dc.publisherHong Kong Polytechnic Universityen_US
dc.rightsAll rights reserveden_US
dc.titleBenchmarking and enhancing the utility of differential privacy for data mining applicationsen_US
dcterms.abstractDifferential Privacy (DP) offers robust guarantees for protecting individual data against malicious attacks in both industrial sectors (e.g., Apple and Google) and administrative sectors (e.g., the U.S. Census Bureau). In general, DP allows for efficient statistical analysis while safeguarding privacy, making it widely adopted in various data mining tasks such as frequency/mean estimation, private data publication, and private learning. However, there exists a trade-off between utility and privacy: enhancing one typically compromises the other. Despite significant efforts to mitigate this trade-off, critical limitations persist. Some solutions achieve high utility but are tailored to specific DP mechanisms or data mining tasks, thus lacking generality. Conversely, more general solutions often fail to deliver superior utility. This creates a dilemma where achieving both generality and effectiveness simultaneously remains challenging.en_US
dcterms.abstractThe works in this thesis together compose a platform that theoretically benchmarks and generally enhances the utilities of various DP mechanisms in two prevalent data mining scenarios: statistical analysis and model training. The main contributions of this thesis are divided into three chapters, organized in a top-down order: high-dimensional statistics estimation [7, 51, 84, 88, 104], centralized learning [6, 82, 85, 160], and federated learning [150].en_US
dcterms.abstractIn the first chapter, we present LDPTube, an analytical toolbox that generalizes and enhances DP mechanisms for high-dimensional mean estimation. Specifically, we leverage the Central Limit Theorem (CLT) [43, 115], one of the most recognized theorems in statistics, to describe the mean square errors (MSEs) of various DP mechanisms. To optimize their MSEs, HDR4ME* uses regularizations to eliminate excessively noisy data, thereby achieving better utilities in high-dimensional mean estimation. The second chapter focuses on the utilities of private centralized learning. Here, we introduce GeoDP, a framework that first theoretically derives the impact of DP noise on model efficiency. Our analysis reveals that the existing perturbation methods introduce biased noise to the gradient direction, resulting in a sub-optimal training process. GeoDP addresses this issue by adding unbiased noise to the gradient direction, thereby improving model utilities. In the final chapter, we propose LDPVec, which theoretically analyzes and enhances model utility in federated learning under various DP mechanisms. Similar to mean estimation, the global aggregation step in federated learning averages noisy gradients from each local party, allowing the CLT to effectively describe model utilities. We observe that preserving the gradient direction is crucial, while the perturbed gradient magnitude can be adjusted through fine-tuning the learning rate or clipping. Consequently, LDPVec optimizes model efficiency by allocating (d-1)/d of the privacy budget to the gradient direction and 1/d to the gradient magnitude.en_US
dcterms.extentxiv, 157 pages : color illustrationsen_US
dcterms.isPartOfPolyU Electronic Thesesen_US
dcterms.issued2025en_US
dcterms.educationalLevelPh.D.en_US
dcterms.educationalLevelAll Doctorateen_US
dcterms.LCSHData protectionen_US
dcterms.LCSHPrivacy -- Mathematical modelsen_US
dcterms.LCSHData mining -- Security measuresen_US
dcterms.LCSHMachine learningen_US
dcterms.LCSHHong Kong Polytechnic University -- Dissertationsen_US
dcterms.accessRightsopen accessen_US

Files in This Item:
File Description SizeFormat 
8142.pdfFor All Users2.18 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/13699