Author:  Hu, Lun 
Title:  Discovering patterns in complex networks with applications to link analysis and clustering 
Degree:  Ph.D. 
Year:  2015 
Subject:  Neural networks (Computer science) Cluster analysis. System analysis. Hong Kong Polytechnic University  Dissertations 
Department:  Dept. of Computing 
Pages:  xvi, 136 pages : illustrations ; 30 cm 
Language:  English 
InnoPac Record:  http://library.polyu.edu.hk/record=b2816341 
URI:  http://theses.lib.polyu.edu.hk/handle/200/8081 
Abstract:  A network consists of a set of objects and their connections and a complex network is a network that has a nontrivial topology. A computational technique that can discover interesting patterns in complex networks can have many applications in a variety of research areas. For example, it can be used to discover protein complexes in proteinprotein interaction networks, or to identify online user communities in social networks. Networks can be represented as graphs with vertices representing objects and edges representing connections between objects. Hence, to discover patterns in networks, graph mining techniques have therefore been used. For many of them to work effectively, patterns are required to have specific topological properties in terms of density, maximal kcliques, or betweenness centrality. But the attributes associated with the objects in a complex network are usually ignored, or treated separately, during the graph mining process. According to empirical studies on complex networks, associations are believed to be existed between the attributes of objects and the links between objects and thus they may provide valuable information for discovering of interesting graph patterns. In this regard, we propose in this thesis a technique that can discover associative patterns from complex networks by taking into consideration the associations between attribute and topology information during the pattern discovery process. This technique works with what are called attributed graphs (AGs). Associated with each vertex in such a graph is an attribute set where each of attribute can take more than one value.Obviously, to discover associative patterns is to discover regularities between attribute and topology information of AGs. A simple but feasible way to represent them is to make use of pairwise attribute values that are significantly observed in connecting vertices in the AG given. That is to say, if the frequency of cooccurrence of the respective attribute values in two connecting vertices is significantly higher, the cooccurrence of the two attribute values is the associative pattern of interest. Hence, for two attribute values, to determine if the frequency of their cooccurrences is significantly higher, we make use of statistical analysis to determine if the conditional probability of one attribute value given the other is significantly higher from the a priori probability of the attribute value occurring irrespective of other attribute values. If the difference is verified to be statistically significant, then the frequency of cooccurrences of the two attribute values can be considered as significantly higher. In this case, the cooccurrence of these two values constitutes an associative pattern. Once such an associative pattern is identified, we further make use of an information theoretic measure to indicate how significant this pattern is. Hence, for two interconnected objects that are represented as two vertices connected by a link in an AG, the association between them can be determined by the number and the amount of significances of association patterns found in between them. The proposed technique can hence discover associative patterns in AGs based on both topological and attribute information. Then a Degree of Association (DOA) measure is introduced to compute the association between vertices based on the amount and the significances of associative patterns found in their attributes. The introduction of associative patterns allows us to fully utilize the potential knowledge in AGs in an efficient way and we can use them to tackle problems in a diversity of graph mining problems. For performance evaluation, we have used it to solve problems in link analysis and graph clustering. For link analysis, associative patterns have been used to predict ProteinProtein Interactions (PPIs) in PPINs based on the protein sequences as attributes for the proteins in the network. An algorithm, VLASPD, has been developed based on the proposed technique to consider variablelength segments of each pair of interacting protein sequences to determine the association relationship that exist between these proteins. Unlike other sequencebased approach, VLASPD is able to discover patterns in interacting proteins by considering association between variablelength segments. As a result, it is able to make use of such patterns to more accurately predict if two proteins may interact with each other. We have tested VLASPD with different real data sets and the experimental results show that VLASPD can predict PPIs accurately and can be a promising approach for PPI prediction.For AG clustering, we first propose a fuzzybased clustering approach, namely FCAG, by combining the topology and attribute information of AGs with the DOA measure. The adoption of fuzzy clustering allows FCAG to identify clusters in a natural manner. However, since there are also applications whose number of clusters is unknown, we further develop an unsupervised clustering algorithm, namely MCLAG, to identify clusters through a markov clustering process. Integrated with the DOA measure, MCLAG is able to discover dense graph clusters consisting of vertices whose attribute values may have significantly closer association with each other. However, based on the experimental results of MCLAG, we note that vertices in the same cluster have not to be similar over all attributes. Therefore, if we have a way to perform the unsupervised clustering by resting on attributes that are more similar while ignoring those with less similarity, clusters can be identified more accurately and efficiently. To do so, we propose an algorithm, namely CAPAG, so that the attribute preferences can be considered during clustering. To evaluate the performance of FCAG, MCLAG and CAPAG, we have applied them to several practical problems, including document classification, social community identification and the prediction of protein complexes. The experimental results show the promising performances of these three approaches. 
Files  Size  Format 

b28163412.pdf  2.663Mb 


As a bona fide Library user, I declare that:  


By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms. 