Improvement of trapdoored model to catch adversarial attacks on neural networks

Xu, Yuan

Full metadata record

DC Field	Value	Language
dc.contributor	Department of Electrical and Electronic Engineering	en_US
dc.contributor.advisor	Hu, Haibo (EEE)	en_US
dc.creator	Xu, Yuan	-
dc.identifier.uri	https://theses.lib.polyu.edu.hk/handle/200/13878	-
dc.language	English	en_US
dc.publisher	Hong Kong Polytechnic University	en_US
dc.rights	All rights reserved	en_US
dc.title	Improvement of trapdoored model to catch adversarial attacks on neural networks	en_US
dcterms.abstract	With the development of the field of machine learning, the vulnerability of deep learning is gradually revealed. In studying how to resist attacks, many studies have achieved certain results. A novel honeypot method has been proposed to protect DNN models. This approach applies the principles of backdoor attacks, transforming the model's weaknesses into a way to defend against adversarial attacks. This approach intentionally injects trapdoors into the model as model weaknesses to attract attackers looking for adversarial samples. The presence of these trapdoors can cause attackers to optimize towards preset weaknesses, causing them to produce trapdoor-like attacks in the feature space. The corresponding defense system can then identify the attack by comparing the input neuronal activation signature with the trapdoor neuronal activation signature.	en_US
dcterms.abstract	In this article, we introduce adversarial attack and several main attack methods, and briefly introduce the main defense direction at present. In addition, we also improve the adversarial attack detection model based on honeypot idea. The improvement direction is mainly from the perspective of how to distinguish clean samples from adversarial samples as much as possible, change the generation mode of trapdoors. We apply simulated annealing algorithm to select trapdoors that can make the neuronal activation characteristics generated by samples injected trapdoors much worse than the neuronal activation vectors of clean samples to improve the model performance. Through experiments, we prove that the improved trapdoor model not only guarantees the accuracy of classification tasks, but also improves the detection effect of PGD, EN and CW attack methods.	en_US
dcterms.extent	1 volume (unpaged) : color illustrations	en_US
dcterms.isPartOf	PolyU Electronic Theses	en_US
dcterms.issued	2023	en_US
dcterms.educationalLevel	M.Sc.	en_US
dcterms.educationalLevel	All Master	en_US
dcterms.accessRights	restricted access	en_US

Files in This Item:

File	Description	Size	Format
8285.pdf	For All Users (off-campus access for PolyU Staff & Students only)	2.54 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/13878