Author: He, Tiantian
Title: Mining clusters in attributed graphs
Advisors: Chan, C. C. Keith (COMP)
Degree: Ph.D.
Year: 2017
Subject: Hong Kong Polytechnic University -- Dissertations
Data mining
Graph theory -- Data processing
Graph algorithms
Department: Department of Computing
Pages: xviii, 146 pages : color illustrations
Language: English
Abstract: Many real-world relational data can be modeled as graphs that contain vertices and edges representing, respectively, data entities and their relationship. One of the most important tasks is to discover graph clusters or communities, which are interesting subgraphs in the graph data. To find such clusters in graph data, many computational methods have been proposed. Most of the prevalent approaches discover graph clusters taking into the consideration either different topological properties of the graph, e.g., density, and modularity, or vertex attributes. However, effective computational approaches for discovering clusters in graphs, which consider both topology and attribute as factors are not many. In this thesis, we propose to discover graph clusters using the Attributed Graph, which contains a set of vertices, edges, and attributes that are associated with vertices. Combining the edge structure with the attribute, it is possible for a computational method to discover clusters in the attributed graph, taking into the consideration edge structure and attributes. Based on the Attributed Graph, we propose four different algorithms. Each of these four algorithms has their unique characteristics and may address the existing challenges in graph clustering. To discover interesting subgraphs in which vertices are inter-related, we propose an algorithm for identifying interesting sub-graphs making use of both edge structure and the degree of attribute association between pairwise vertices (MISAGA). MISAGA formulates the task of discovering k sub-graphs as a constrained optimization problem and solves it by identifying the optimal affiliation of sub-graphs for the vertices through an iterative updating algorithm. In each of the interesting sub-graphs found by MISAGA, vertices are densely connected and their attribute values are significantly associated, although their attribute values might not be the same. As there are no very effective graph clustering algorithms that are based on fuzzy set theory, we propose an algorithm for discovering fuzzy structural patterns in attributed graphs (FSPGA). FSPGA adopts an effective fuzzy clustering framework to allow overlapping clusters to be identified. As the identified clusters in some real applications, e.g., functional modules in biological graphs, need to be connected components, we further propose two more algorithms, called EGCPI and TBPCI for identifying clusters of interest. Different from other approaches, EGCPI formulates the task of discovering clusters in the attributed graph as an optimization problem and tackles it with evolutionary clustering. It can identify those sub-graphs in which vertices are densely connected as well as their attributes are more similar. TBPCI identifies clusters utilizing local information of vertex connectedness and the attribute association between pairwise vertices in attributed graph. TBPCI may compute the optimal degree of boundedness between each pair of vertices which may capture how strong the vertices can be considered as bounded together. Then the clusters can be identified by grouping those vertices sharing degrees of boundedness which are sufficiently strong. The proposed algorithms have been used in different real-world applications, including community detection in social network graphs and functional modules identification in biological network graphs. The experimental results show these proposed algorithms outperform state-of-the-art approaches.
Rights: All rights reserved
Access: open access

Files in This Item:
File Description SizeFormat 
991021965757703411.pdfFor All Users2.8 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/9129