Learning representations for discovering patterns in networks

Hu, Pengwei

Author:	Hu, Pengwei
Title:	Learning representations for discovering patterns in networks
Advisors:	Chan, C. C. (COMP)
Degree:	Ph.D.
Year:	2019
Subject:	Hong Kong Polytechnic University -- Dissertations Machine learning Computer algorithms
Department:	Department of Computing
Pages:	155 pages : color illustrations
Language:	English
Abstract:	A network is made up of a set of objects and their links. It can be represented as a graph, with vertices representing objects and edges representing links between objects. An algorithm capable of learning useful representation in the network can have many applications in many disciplines. For example, such a technique can be used to learn node representation in drug-target interaction networks for link prediction or to extract a discriminative graph representation for social network analysis. An appropriate representation of a network can make it easier to extract valuable patterns when performing such tasks as link classification or node clustering in a network. Suppose we want to learn a representation that makes density estimation easier. The distribution of more independence is easier to model. The most common methods are to use feature selection in conjunction with machine learning. However, these approaches these methods have the disadvantage that they do not consider all available heterogeneous information. For example, those approaches to learning representations in heterogeneous networks are expected to be optimized against high dimensionality and multimodality. Hence, there is a need for the development of algorithms that can learn representations retaining heterogeneous information carried by the network. Except for the integrity of the heterogeneous information, patterns are required to have specific explainable property in some studies. Prevalent approaches to learning network representations tend to pay more attention to network topology. So, we also need the representation learning algorithms that maintain interpretability on content features characterizing the nodes. In this thesis, we attempt to address this challenging issue by proposing effective approaches that are essential to a reliable framework for learning network representations for pattern discovery. To transform data into a learnable form, we propose a multi-scale method to transform the raw data into the multi-scale representation, this preliminary step is to fully transform the data information. Then, we propose multi-scale feature deep representations inferring interactions (MFDR) to classify links in a network. MFDR use Auto-encoder as building blocks of deep networks to map high-dimensional features into low-dimensional space. As for learning representations from network, we concentrate on two categories that are integrated representation learning and interpretable representation learning. For integrated representation learning, we propose deep multiple networks fusion (DMNF), which is a novel graph clustering approach by learning latent representation from multi-networks. To perform the task, DMNF first constructs a network representing the degree of interrelationship between pairwise vertices by utilizing a fusion method. Given the fused network data, DMNF attempts to learn the latent network representation by making use of a deep neural network model. We also propose a new algorithm to predict unknown links from the fused representation through deep network fusion (DFNet). Given heterogeneous networks, DFNet implements a network completion method improves network confidence. For interpretable representation learning, we present GraphSE to learning significant subgraphs in graphs so that these subgraphs can be used for the link prediction task. In particular application, given the attributed graphs, we can find a set of subgraphs that can be explained and can be used to predict whether a node can be linked to a specific target. In the clustering tasks mentioned above, few of latent network representation can be summarized. To address this challenge, we propose a novel latent representation model for community identification and summarization, which is named as LFCIS. To perform the task, LFCIS formulates an objective function that evaluating the overall clustering quality by taking into the consideration both edge topology and node features in the network. At last, we try to take a small step forward to solve the unbalanced link prediction problem. We adopt a support vector data description to learn the one-class data representation for summarizing small samples. The approaches to data transformation, and the models for learning network representations presented in this thesis have been used in various real applications. In particular, we have applied them to drug-target interaction prediction, drug and side-effect (SE) link prediction and social network clustering. The experimental results show that the learned representations can improve the performance of traditional algorithms and outperform state-of-the-art approaches.
Rights:	All rights reserved
Access:	open access

Files in This Item:

File	Description	Size	Format
991022223557003411.pdf	For All Users	3.54 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/9986