Feature representation for large-scale data set

Hu, Yanxing

Full metadata record

DC Field	Value	Language
dc.contributor	Department of Computing	en_US
dc.contributor.advisor	You, Jane (COMP)	en_US
dc.contributor.advisor	Liu, N. K. James (COMP)	en_US
dc.creator	Hu, Yanxing	-
dc.identifier.uri	https://theses.lib.polyu.edu.hk/handle/200/11412	-
dc.language	English	en_US
dc.publisher	Hong Kong Polytechnic University	en_US
dc.rights	All rights reserved	en_US
dc.title	Feature representation for large-scale data set	en_US
dcterms.abstract	Feature representation is one of the most important research topics in Machine Learning (ML) area. In machine learning, representation of features means mapping the raw data into a new feature space that can be effectively exploited in machine learning tasks. Many supervised and unsupervised approaches, including supervised dictionary learning, Fuzzy and rough logics, Principal Component Analysis (PCA), local linear embedding, have been employed for feature representation of different types of data sets. The coming of the big data era brings both opportunities and challenges to the studies on feature representation. In real applications, the scale and the complexity of employed data far exceed the previous scenarios. On the one hand, the large volume of data set enables more complicate models be employed for feature representation, on the other hand, the multi-data source, complicate data structure and high computational requirement bring the new difﬁculties to the feature representation for huge data sets. In this study, concentrating on the feature representation problem for large-scale data set and related applications, new algorithms were proposed so that the obtained feature mapping enables better results for machine learning tasks. Our study starts with the feature representation for data set with discrete values. For data sets with discrete values, the features often contain some categorical information about the data points. This study solves the feature representation of this kind of data by providing a novel rough set-based feature reduction approach, to efﬁciently and reliably extract the necessary information in the features while removing the redundant information of the data set.	en_US
dcterms.abstract	Our second work is to provide a matrix decomposition based unsupervised pre-training approach for the feature representation. One of the important unsupervised feature representations approach is based on clustering models. However, clustering approaches are time-consuming, especially for large-scale data sets. An eigenvector based unsupervised pre-training approach is therefore proposed for feature representation, and combined as the ﬁrst layer of the Radial Basis Function Neural Network(RBFNN). Our third work concentrates on the feature representation for the data from multiple sources/views. A canonical correlation based-Auto encoder model is proposed for the feature fusion representation issue of the multi-domain data sets. The proposed model is consequently applied to the wind speed forecasting scenario to improve the wind speed forecasting accuracy. Finally, we proposed a localize generalization error based data reduction approach, this approach can reliably reduce the training set for some large-scale data set, which provide a thought for the large-scale learning takes. This approach is highly related to the distribution of the values for each feature, it can be seen from this work that the representation of the features can affect the necessary number of training samples. In summary, we make the following contributions: (i) algorithms and applications for feature representation on different types of large scale data sets; (ii) multi-domain feature fusion approach and applications; (iii) algorithms for computing the safe regions for the sum-optimal point notiﬁcation problem.	en_US
dcterms.extent	xviii, 175 pages : color illustrations	en_US
dcterms.isPartOf	PolyU Electronic Theses	en_US
dcterms.issued	2021	en_US
dcterms.educationalLevel	Ph.D.	en_US
dcterms.educationalLevel	All Doctorate	en_US
dcterms.LCSH	Machine learning	en_US
dcterms.LCSH	Algorithms -- Data processing	en_US
dcterms.LCSH	Data mining	en_US
dcterms.LCSH	Hong Kong Polytechnic University -- Dissertations	en_US
dcterms.accessRights	open access	en_US

Files in This Item:

File	Description	Size	Format
5850.pdf	For All Users	4.24 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/11412