Author: Zhou, Jiyun
Title: Analysis on protein-DNA interaction and gene expression
Advisors: Lu, Qin (COMP)
Degree: Ph.D.
Year: 2019
Subject: Hong Kong Polytechnic University -- Dissertations
DNA-protein interactions
Gene expression
Department: Department of Computing
Pages: xviii, 229 pages : color illustrations
Language: English
Abstract: Gene expression is pivotal in genomic biology. As experimental methods for gene expression prediction are costly and labor-consuming, there is an urgent to develop high-performance computational methods for gene expression predictions. As gene expressions are mainly regulated by interactions between DNAs and transcription factors (TFs) which is a type of proteins with special function, analysis on TF-DNA interactions may facilitate the prediction of gene expressions. This thesis focuses on the analysis of protein-DNA interactions and gene expression. We attempt to address issues in four aspects in gene expression analysis including (1) protein second structure prediction, (2) DNA binding residue prediction, (3) TF binding site (TFBS) prediction and (4) gene expression prediction. Our contribution mainly consists of four parts. For protein second structure prediction, we present a novel deep learning based prediction method, referred to as CNNH_PSS, which uses a multi-scale CNN with highway to capture both local context and longer-range dependencies. In CNNH_PSS, a specifc part of the information is delivered from a current layer to the output of the next one by highways to keep local context and the other parts of information are delivered from current layer to the input of the next one to capture dependencies among residues with longer distance. Therefore, the feature space learned by CNNH_PSS contains both local context and long-range interdependencies.
For DNA-binding residue prediction, the research goal is to learn relationships among residues for the prediction of DNA-binding residues. In this thesis, four prediction methods are proposed to learn relationships among residues. The first method applies PSSM (Position Specifc Score Matrix) distance transformation to encode local pairwise relationships between neighboring residues. The second method applies Convolutional Neural Network to learn relationships among several neighboring residues. The third method applies Long Short-Term Memory to learn both local relationships and long-range relationships among residues. The last method makes use of two sliding windows to learn sequence relationships and structure relationships, respectively. For TF-binding site (TFBS) prediction, three prediction methods are proposed. First, a novel method is proposed to capture higher order relationships among nucleotides by applying two CNNs on histone modifcations and DNA sequence, respectively. Second, a multi-task framework. is proposed to particular address data sparseness issue by leveraging on cross-cell-type information available. The method learns common features from multiple cell-types using a shared CNN and individual features by a private CNN for each cell-type. The last method is proposed for for the cross-TF TFBS prediction by learning TFBSs from other TFs in the training set. This method can further address the non-available issue in the current training data. Current gene expression prediction methods can only be used for cell-types or tissues in which ChIP-seq datasets for most important TFs are labeled. However, for most cell-types or tissues in human beings, the ChIP-seq datasets for most TFs are not available. In this work, a novel prediction method is proposed to first predict TFBSs by our cross-cell-type prediction method and the cross-TF prediction method. They are then combined with histone modifcations to learn feature representations for genes. The advantage of this method is that it predict gene expressions for any cell-type regardless of the availability of the TFBS of the considered TFs. Our proposed method can automatically extract combinatorial relationships among histone modifcations and TFBSs. These relationships and TFBSs play very important roles in regulating gene expression and facilitate the understanding of gene expression regulation for humans.
Rights: All rights reserved
Access: open access

Files in This Item:
File Description SizeFormat 
991022210744403411.pdfFor All Users5.59 MBAdobe PDFView/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: