Defending against stealthy mobile malware

Zhao, Kaifa

Full metadata record

DC Field	Value	Language
dc.contributor	Department of Computing	en_US
dc.contributor.advisor	Luo, Xiapu (COMP)	en_US
dc.creator	Zhao, Kaifa	-
dc.identifier.uri	https://theses.lib.polyu.edu.hk/handle/200/13949	-
dc.language	English	en_US
dc.publisher	Hong Kong Polytechnic University	en_US
dc.rights	All rights reserved	en_US
dc.title	Defending against stealthy mobile malware	en_US
dcterms.abstract	The widespread popularity of Android, as one of the most widely-used mobile operating systems, is attributed to its ability to provide users with a wide range of convenient and entertaining options through its functional apps. Nevertheless, mobile users may face a risk to their privacy and property from potentially harmful apps that can be installed on their devices.	en_US
dcterms.abstract	This thesis focuses on combining Android static analysis, artificial intelligence techniques, and natural language processing techniques to investigate app behavior, discover vulnerabilities in Android malware detection systems, and understand Android apps' privacy policies. To safeguard user privacy from potentially harmful apps, we propose the following measurements: (1) Investigating the vulnerability of Android malware detection systems under evolving structural attacks and proposing defense solutions. (2) Analyzing whether Android app privacy policies meet regulatory requirements. and (3) Empirically evaluating the capacity of pre-trained large language models to identify regulation-required components in Android privacy policies.	en_US
dcterms.abstract	For (1), to investigate the vulnerability of Android malware detection (AMD) systems under evolving attacks and design effective defense solutions, we propose a Heuristic optimization model integrated with Reinforcement learning framework to optimize our structural ATtack, namely HRAT, which is the first problem-pace structural attack designed to deceive Android malware detection systems. HRAT employs four types of graph modification operations and corresponding bytecode manipulation techniques to generate executable adversarial apps that can evade detection. HRAT bridges the research gap between feature-space attacks, which generate only adversarial features to deceive machine learning models, and problem-space attacks, which generate complete adversarial objects, i.e., executable Android apps in our scenario. Our extensive experiments demonstrate that HRAT demonstrates effective attack performance and remains robust against obfuscation methods that do not affect the app's function call graph. In addition, we propose potential defense solutions to improve the robustness of AMD against such advanced attack methods.	en_US
dcterms.abstract	For (2), we construct a benchmark dataset for Android privacy policies, i.e., a novel large-scale human-annotated Chinese Android application privacy policy dataset, namely CA4P-483. Following a manual inspection of regulatory articles, we identify seven types of labels that are relevant to the regulatory requirements for apps' access to user data. We design a two-step annotation process to ensure label agreement, and our evaluation showed that our annotations achieved a Kappa value of 77.20%, indicating substantial agreement for CA4P-483. In addition, we evaluate robust and representative baseline models in our dataset and present our findings and potential research directions based on our results. Finally, we conduct case studies to explore the potential application of CA4P-483 in protecting user privacy.	en_US
dcterms.abstract	For (3), we empirically evaluate three widely used pre-trained large language models on the CA4P-483 dataset. This work aims to explore the capacity of LLMs in processing Chinese privacy policies and to uncover their potential to address compliance issues that are challenging for traditional NLP techniques. Building on our previous work with CA4P-483, we leverage the semantic understanding capabilities of pre-trained LLMs and apply carefully crafted prompts according to established prompt engineering principles to maximize the models' inference performance. Our evaluation reveals that state-of-the-art pre-trained LLMs still fall short of achieving satisfactory performance on the Chinese privacy policy dataset. The limitations may stem from the complexity of the language environment, the intricate cross-relationships among elements within privacy policies, and the models' current generalization capabilities. Based on our evaluation results, we also propose potential future research directions that include leveraging long-context LLMs to analyze privacy policies holistically and achieve overall semantic consistency, as well as training a dedicated large-scale privacy policy analysis model that incorporates multilingual datasets to address privacy policies across different languages and platforms.	en_US
dcterms.extent	xvii, 158 pages : color illustrations	en_US
dcterms.isPartOf	PolyU Electronic Theses	en_US
dcterms.issued	2025	en_US
dcterms.educationalLevel	Ph.D.	en_US
dcterms.educationalLevel	All Doctorate	en_US
dcterms.accessRights	open access	en_US

Files in This Item:

File	Description	Size	Format
8408.pdf	For All Users	6.02 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/13949