Localized generalization error model with variable size of neighborhoods and applications in ensemble feature selection

Chan, Po-fong

Full metadata record

DC Field	Value	Language
dc.contributor	Department of Computing	en_US
dc.creator	Chan, Po-fong	-
dc.identifier.uri	https://theses.lib.polyu.edu.hk/handle/200/6468	-
dc.language	English	en_US
dc.publisher	Hong Kong Polytechnic University	-
dc.rights	All rights reserved	en_US
dc.title	Localized generalization error model with variable size of neighborhoods and applications in ensemble feature selection	en_US
dcterms.abstract	The Localized Generalization Error Model (L-GEM) provides an upper bound of the generalization error for unseen samples located in the Q-neighborhood of each training sample. It originates from the idea that expecting a classifier to recognize unseen samples in the whole input space correctly is unreasonable, as some are very different from training samples. L-GEM has been applied successfully to a wide range of problems, including generic applications such as feature selection, architecture selection, classifier training and selection, and active learning. For domain-specific applications, it has been applied to semantic image classification, bankruptcy prediction, stock market investment strategy enhancement, RFID indoor positioning, content-based information retrieval, and steganalysis. A crucial parameter, Q, is used to adjust the size of neighborhood of each training sample; L-GEM provides an upper bound of the generalization error of the unseen samples in the neighborhood. In current literature, parameter Q is selected using the ad hoc approach. A single Q value is selected and the same value is used by all training samples; thus, all training samples will have the same size of neighborhood. However, certain training samples may be extremely close to one another, and the same size of neighborhood may result in large overlapping. One of the objectives of this thesis is to study the selections of different Qs for individual training samples instead of selecting a single value for all. In view of the high computational complexity of L-GEM with variable neighborhood sizes, the second objective is to propose a new point of view by clustering data into different groups instead of a single data point. The neighborhoods are considered for each cluster, instead of each training sample.	en_US
dcterms.abstract	However the trend of performance of a classifier on different sizes of neighborhoods is ignored. These fluctuations and trend provide important information to evaluate the classifier. Therefore novel indices are proposed to evaluate the performance of a classifier on a range of Q. The third objective of the thesis is to propose a set of L-GEM based indices to evaluate the classifier with different sizes of neighborhood sizes. An overall comparison is provided for the proposed methods together with the current methods for classifier selection. In addition, the performance of the proposed methods in different scenarios with outlier is provided. In summary, the proposed methods will be able to select classifier with better performance and apply in different applications successfully. L-GEM has been extended into a multiple classifier system (MCS), and it has been shown to evaluate the generalization capability of MCS successfully. To construct an MCS, creating diverse sets of classifiers is a key issue. The ensemble feature selection is an approach for constructing an MCS; it varies the feature sets for each individual classifier in an MCS. Promoting diversity alone may not generate MCS with high generalization capability. Therefore, a genetic algorithm (GA) and localized generalization error model for MCS (L-GEMMCS) will be adopted to select sets of diversified feature groups for constructing an MCS with high generalization capability. Benchmarking of UCI datasets and real-world problems is then adopted to evaluate the proposed approaches.	en_US
dcterms.extent	xix, 182 leaves : ill. ; 30 cm.	en_US
dcterms.isPartOf	PolyU Electronic Theses	en_US
dcterms.issued	2011	en_US
dcterms.educationalLevel	All Doctorate	en_US
dcterms.educationalLevel	Ph.D.	en_US
dcterms.LCSH	Neural networks (Computer science)	en_US
dcterms.LCSH	Artificial intelligence.	en_US
dcterms.LCSH	Hong Kong Polytechnic University -- Dissertations	en_US
dcterms.accessRights	open access	en_US

Files in This Item:

File	Description	Size	Format
b25072158.pdf	For All Users	6.83 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/6468