Author: Ma, Chi-hung Patrick
Title: Effective techniques for gene expression data mining
Degree: Ph.D.
Year: 2006
Subject: Hong Kong Polytechnic University -- Dissertations.
Gene expression -- Data processing.
Data mining.
Department: Department of Computing
Pages: vii, 152 p. : ill. ; 30 cm.
Language: English
Abstract: Gene expression data mining as a new research area poses new challenges to data mining researchers. Gene expression data are typically very noisy and have very high dimensionality. To tackle bioinformatics problems involving them, traditional data mining techniques may not be the best tools to use as they were not originally developed to deal with such data. For this reason, new effective techniques are required. In this thesis, we propose some such techniques. In particular, these techniques can be used to address the problems of reconstructing gene regulatory networks and clustering gene expression data. The former is concerned with the problem of discovering gene interactions to infer the structures of gene regulatory networks. The latter is concerned with the problem of discovering clusters of co-expressed genes so that genes that have similar expression patterns under different experimental conditions can be identified. To reconstruct gene regulatory networks, we have proposed to use an association-discovery technique, which is based on residual analysis and an information theoretic measure, to detect whether or not there interesting association relationships between genes. Given time-dependent gene expression data, this technique can reveal interesting sequential associations between genes for the effective inference of the structures of gene regulatory networks. The association-discovery technique proposed can also be used to find interesting association relationships between gene expression levels and cluster labels. Based on discovering such relationships, we have developed a two-phase clustering algorithm for gene expression data. This algorithm consists of an initial clustering phase and a second re-clustering phase. Using this two-phase approach, it is able to group genes, whose cluster memberships cannot be easily determined by existing methods, into the appropriate clusters. Since the effectiveness of the two-phase clustering algorithm depends, to some extent, on that of the existing clustering method used in the first phase, therefore, we have developed a novel evolutionary clustering algorithm, called EvoCluster, that can be used in the first phase to overcome some of the limitations of existing ones. By making use of an evolutionary approach and the association-discovery technique, it not only is able to perform well in the presence of very noisy data, it can also be used to discover overlapping clusters. For performance evaluation, the data mining techniques proposed in this thesis have been tested with simulated and real data and the experimental results show that they are very promising.
Rights: All rights reserved
Access: open access

Files in This Item:
File Description SizeFormat 
b20592863.pdfFor All Users2.84 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/223