Full metadata record
DC FieldValueLanguage
dc.contributorDepartment of Computingen_US
dc.creatorLee, Ho-kei Sean-
dc.identifier.urihttps://theses.lib.polyu.edu.hk/handle/200/266-
dc.languageEnglishen_US
dc.publisherHong Kong Polytechnic University-
dc.rightsAll rights reserveden_US
dc.titleA genetic algorithm based approach for clustering categorical dataen_US
dcterms.abstractGiven a database of records, clustering is concerned with the grouping of similar records into different groups or clusters based on their attribute values. Many algorithms have been proposed in the past to address the clustering problem but most of them are developed mainly to handle continuous-valued data. Relatively little attention has been paid to the clustering of categorical data. Given that these kind of data is very commonly collected in many applications in business, medicine and the social sciences, etc., it is important that an effective clustering algorithm be developed to handle such data, in this thesis, we propose such an algorithm. This algorithm is based on the use of a simple genetic algorithm (GA) that employs a probabilistic search technique for solutions that are supposedly optimal or near-optimal according to some performance criteria. This GA-based clustering algorithm makes use of an encoding scheme that can encode clustering results in chromosomes effectively. To work with this scheme, we also propose a set of genetic operators that can facilitate the exchange of clustering information between chromosomes on one hand and allow variations to be introduced on the other. For the proposed GA to work well, we have also introduced a fitness function to evaluate clustering quality. This is based on an information theoretic measure that measures how much the presence of a particular attribute value supports or refutes a record in a data set to be classified into a specific cluster. The higher its fitness value based on the evaluation function, the better the solution encoded in a chromosome. Unlike traditional algorithm, the proposed GA-based clustering algorithm has the advantage that it can automatically determine the number of clusters hidden in a dataset. The proposed algorithm has been tested with both simulated and real data; the results show that it is very promising and can have many real applications.en_US
dcterms.extentvii, 103 leaves : ill. ; 31 cmen_US
dcterms.isPartOfPolyU Electronic Thesesen_US
dcterms.issued2006en_US
dcterms.educationalLevelAll Masteren_US
dcterms.educationalLevelM.Phil.en_US
dcterms.LCSHHong Kong Polytechnic University -- Dissertationsen_US
dcterms.LCSHCluster analysis -- Data processingen_US
dcterms.LCSHAlgorithmsen_US
dcterms.accessRightsopen accessen_US

Files in This Item:
File Description SizeFormat 
b20697260.pdfFor All Users1.75 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/266