Graph anomaly detection with diverse supervision signals

Zhou, Shuang

Author:	Zhou, Shuang
Title:	Graph anomaly detection with diverse supervision signals
Advisors:	Chung, Fu-lai Korris (COMP) Huang, Xiao (COMP)
Degree:	Ph.D.
Year:	2024
Subject:	Computer networks -- Security measures Computer networks -- Monitoring Computer security Hong Kong Polytechnic University -- Dissertations
Department:	Department of Computing
Pages:	xx, 174 pages : color illustrations
Language:	English
Abstract:	Many real-world systems, including social media, molecular networks, and payment transaction networks, can be modeled as graphs. On these networked systems, there exist anomalies, and even only a few anomalies can cause detrimental loss. Based on feature compactness, anomalies roughly fall into two types: scattered anomalies (a.k.a. point anomalies) and rare categories (a.k.a. clustered anomalies or group anomalies). Scattered anomalies are individual instances that randomly appear in feature space and deviate from the majority of other individual samples. For instance, accidents may occur in the transportation network, which is different from the normal status; but the reason for the accidents may differ. Rare categories denote some minority groups of data objects that exhibit compact properties in feature space (i.e., sharing similar behavior patterns) and deviate significantly from the vast majority as a whole. For example, in social networks (e.g., Twitter or Weibo), many frauds appear in different areas of the world and perform unethical behaviors (e.g., sending phishing links, spreading sham publicity) for illegal gains. Hence, there is a pressing need to detect graph anomalies to eliminate potential losses. Since label annotation is costly and time-consuming, most existing GAD methods are unsupervised, including community analysis methods, subspace-selection methods, residual-based methods, and deep learning methods. These unsupervised methods usually suffer from unsatisfied detection accuracy. In some scenarios, some labeled anomalies are available, e.g., a few money laundry frauds were identified in the historical records of PayPal. Additionally, diverse types of supervision signals, e.g., labeled normal data, labeled anomalies in auxiliary graphs, and human knowledge, may exist in real-world scenarios. The supervision signals potentially benefit GAD and bring researchers new opportunities. Hence, how to leverage diverse types of supervised information to address some key issues in GAD is intriguing. Here, we introduce our works that leverage supervision signals under different scenarios to address the existing GAD issues. Specifically, we first present a novel abnormality-aware graph neural network (AAGNN) in Chapter 3 that incorporates human’s general knowledge about abnormal patterns in feature engineering for graph feature aggregation. It addresses the issue that vanilla graph neural networks may not perform well on GAD tasks. In Chapter 4, we propose a principled method called multi-hypersphere graph learning (MHGL) that leverages the labeled anomalies and normal data to identify seen and unseen anomalies on an attributed graph. It addresses the issue that labeled anomalies usually only cover limited anomaly types and the trained GAD models cannot detect novel types of anomalies not appeared in training, i.e., unseen anomalies. In Chapter 5, we first reveal one existing issue, i.e., previous GAD works, which generally leverage either human’s general knowledge about abnormal patterns or the available labels, are adept at detecting either scattered or clustered anomalies, but can hardly effectively identify both of them. Then, we raise an intuitive question: how to effectively detect diverse graph anomalies with the available labels and general knowledge about abnormal patterns? Accordingly, we introduce a unified framework called collaborative graph anomaly detection (Co-GAD) for the research problem. In Chapter 6, a novel data augmentation method called AugAN is proposed to leverage the supervision from multiple graphs to boost the generalizability of GAD models. It addresses the issue of the trained GAD models not being able to perform well on new (sub)graphs (e.g., a newly extended area in PayPal transaction network) with different distributions. In the last chapter, we conclude these works and describe possible directions for future work.
Rights:	All rights reserved
Access:	open access

Files in This Item:

File	Description	Size	Format
7389.pdf	For All Users	26.13 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/12955