Author: Huang, Yuan
Title: Link prediction in microrna-mediated biomolecular networks
Advisors: Chan, C. C. Keith (COMP)
Degree: Ph.D.
Year: 2020
Subject: MicroRNA
Bioinformatics
Computational biology
Data mining
Hong Kong Polytechnic University -- Dissertations
Department: Department of Computing
Pages: xiv, 124 pages : color illustrations
Language: English
Abstract: Many problems in the real-world can be formulated as discovering the existence of relationship between objects in a set of inter-related objects. For example, in molecular biology, it is known that microRNA and human diseases are related as they may interact with each other. While the existence of interaction relationship between some of them may be known, the existence of some others may not. One problem is, therefore, for the existence of interaction relationship between a microRNA and a human disease to be determined based on known relationship between other microRNAs and human diseases. If we represent microRNAs and human diseases as nodes in a network, then the links between them can be used to represent their interaction relationship, we have a biomolecular network. Given such a network, we can then define a link prediction problem as the prediction of missing links in the network based on existing links. In this thesis, we tackle the link prediction problem of three kinds of biomolecular networks that involve mediated microRNA. Specifically, we predict three types of interaction relationships between microRNA and three other different objects: (i) complex human diseases, (ii) drug resistance and (iii) lncRNA. Based on known interaction data obtained from public databases, we construct microRNA-mediated biomolecular networks containing nodes and unweighted links. The nodes are of two types. One type represents microRNA and the other represents either diseases, drug resistance or lncRNA. The links between these different types of nodes represent interaction relationship between the two types of objects. Given the biomolecular networks, our problems are to use the known links to predict the missing ones in the networks.
In the datasets we collected, known interaction data are often limited in number. To improve the prediction performance, in addition to the known links, we introduce node information data that are biologically relevant to the objects that the nodes represent for link prediction. These data can be related to the biological or physicochemical properties of the objects. They can be concerned with expression profiles, drug structural data, RNA sequences, etc., and their data types can be very different. For example, when predicting links in microRNA-disease association network, the data we use to characterize the node of microRNAs can be another network -- the lncRNA-microRNA interaction network. When predicting the links in microRNA-drug resistance association network, the data we use to characterize the nodes of drugs and microRNAs are high-dimensional numerical features. When predicting the links in microRNA-lncRNA interaction network, the data we use to characterize the nodes of microRNA and lncRNA are network multiple similarity matrixes. The main challenges of our research, therefore, lie in finding ways to introduce these different kinds of node information during the prediction process. To overcome these difficulties, we propose four different algorithms that can each effectively tackle different challenges. Specifically, to predict associations between microRNA and diseases, MVMTMDA algorithm considers the data incompleteness of lncRNA-microRNA interactions. It formulates the prediction task as a multi-task problem, in which the links of lncRNA-microRNA interaction and microRNA-disease association are simultaneously predicted, and adopts multi-view learning to learn the embedding of microRNA nodes from two networks. When predicting the associations between microRNA and drug resistance, the nodes have attributes whose dimensions are up to thousands, which is extremely high. GCMDR algorithm used a spectral graph convolution technique to solve this problem. The deep neural network structure it adopts can be applied to high dimensional node numerical features, allowing an end-to-end prediction without any data preprocessing process. Different from other prediction tools for microRNA targets that are based on sequence matching, EPLMI algorithm for the first time, reformulates the lncRNA-microRNA interaction prediction task as a link prediction problem and adopts a two-way diffusion method to perform prediction. To improve the prediction performance of EPLMI, we further propose LMNLMI algorithm which use a similarity network fusion technique to collectively consider multiple types of lncRNA/microRNA similarity. The proposed algorithms have been applied on real-world datasets that we collected from the public databases. The experimental results illustrate our proposed models are accurate, efficient, robust to parameter settings and outperform state-of-the-art approaches.
Rights: All rights reserved
Access: open access

Files in This Item:
File Description SizeFormat 
5511.pdfFor All Users1.9 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/11041