Full metadata record
DC FieldValueLanguage
dc.contributorDepartment of Computingen_US
dc.contributor.advisorLuo, Xiapu (COMP)en_US
dc.creatorZhao, Kaifa-
dc.identifier.urihttps://theses.lib.polyu.edu.hk/handle/200/13949-
dc.languageEnglishen_US
dc.publisherHong Kong Polytechnic Universityen_US
dc.rightsAll rights reserveden_US
dc.titleDefending against stealthy mobile malwareen_US
dcterms.abstractThe widespread popularity of Android, as one of the most widely-used mobile operating systems, is attributed to its ability to provide users with a wide range of convenient and entertaining options through its functional apps. Nevertheless, mobile users may face a risk to their privacy and property from potentially harmful apps that can be installed on their devices.en_US
dcterms.abstractThis thesis focuses on combining Android static analysis, artificial intelligence techniques, and natural language processing techniques to investigate app behavior, discover vulnerabilities in Android malware detection systems, and understand Android apps' privacy policies. To safeguard user privacy from potentially harmful apps, we propose the following measurements: (1) Investigating the vulnerability of Android malware detection systems under evolving structural attacks and proposing defense solutions. (2) Analyzing whether Android app privacy policies meet regulatory requirements. and (3) Empirically evaluating the capacity of pre-trained large language models to identify regulation-required components in Android privacy policies.en_US
dcterms.abstractFor (1), to investigate the vulnerability of Android malware detection (AMD) systems under evolving attacks and design effective defense solutions, we propose a Heuristic optimization model integrated with Reinforcement learning framework to optimize our structural ATtack, namely HRAT, which is the first problem-pace structural attack designed to deceive Android malware detection systems. HRAT employs four types of graph modification operations and corresponding bytecode manipulation techniques to generate executable adversarial apps that can evade detection. HRAT bridges the research gap between feature-space attacks, which generate only adversarial features to deceive machine learning models, and problem-space attacks, which generate complete adversarial objects, i.e., executable Android apps in our scenario. Our extensive experiments demonstrate that HRAT demonstrates effective attack performance and remains robust against obfuscation methods that do not affect the app's function call graph. In addition, we propose potential defense solutions to improve the robustness of AMD against such advanced attack methods.en_US
dcterms.abstractFor (2), we construct a benchmark dataset for Android privacy policies, i.e., a novel large-scale human-annotated Chinese Android application privacy policy dataset, namely CA4P-483. Following a manual inspection of regulatory articles, we identify seven types of labels that are relevant to the regulatory requirements for apps' access to user data. We design a two-step annotation process to ensure label agreement, and our evaluation showed that our annotations achieved a Kappa value of 77.20%, indicating substantial agreement for CA4P-483. In addition, we evaluate robust and representative baseline models in our dataset and present our findings and potential research directions based on our results. Finally, we conduct case studies to explore the potential application of CA4P-483 in protecting user privacy.en_US
dcterms.abstractFor (3), we empirically evaluate three widely used pre-trained large language models on the CA4P-483 dataset. This work aims to explore the capacity of LLMs in processing Chinese privacy policies and to uncover their potential to address compliance issues that are challenging for traditional NLP techniques. Building on our previous work with CA4P-483, we leverage the semantic understanding capabilities of pre-trained LLMs and apply carefully crafted prompts according to established prompt engineering principles to maximize the models' inference performance. Our evaluation reveals that state-of-the-art pre-trained LLMs still fall short of achieving satisfactory performance on the Chinese privacy policy dataset. The limitations may stem from the complexity of the language environment, the intricate cross-relationships among elements within privacy policies, and the models' current generalization capabilities. Based on our evaluation results, we also propose potential future research directions that include leveraging long-context LLMs to analyze privacy policies holistically and achieve overall semantic consistency, as well as training a dedicated large-scale privacy policy analysis model that incorporates multilingual datasets to address privacy policies across different languages and platforms.en_US
dcterms.extentxvii, 158 pages : color illustrationsen_US
dcterms.isPartOfPolyU Electronic Thesesen_US
dcterms.issued2025en_US
dcterms.educationalLevelPh.D.en_US
dcterms.educationalLevelAll Doctorateen_US
dcterms.accessRightsopen accessen_US

Files in This Item:
File Description SizeFormat 
8408.pdfFor All Users6.02 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/13949