Clustering and classification on uncertain data

Pao Yue-kong Library Electronic Theses Database

Clustering and classification on uncertain data

 

Author: Xu, Lei
Title: Clustering and classification on uncertain data
Degree: Ph.D.
Year: 2013
Subject: Data mining.
Cluster analysis.
Classification.
Hong Kong Polytechnic University -- Dissertations
Department: Dept. of Computing
Pages: xvi, 113 p. : ill. ; 30 cm.
Language: English
InnoPac Record: http://library.polyu.edu.hk/record=b2653067
URI: http://theses.lib.polyu.edu.hk/handle/200/7266
Abstract: We study the problem of mining on uncertain objects whose locations are uncertain and described by probability density functions (pdf). Clustering and classification are two important tasks in data mining. Clustering on uncertain objects is different from traditional case on certain objects. UK-meansis proposed based on K-meansbutitis time consuming. Pruning techniques are proposed to improve the efficiency of UK-means. First we analyze existing pruning algorithms and experimentally show that there exists a new bottleneck in the performance due to the overhead of pruning candidate clusters for assignment of each uncertain object in each iteration. In this thesis, we will show that by considering squared Euclidean distance, UK-means (without pruning techniques) is reduced to K-means and performs much faster than pruning algorithms, however, with some discrepancies in the clustering results due to different distance functions used. Thus, we propose Approximate UK-means to heuristically identify objects of boundary cases and re-assign them to better clusters. In addition, we propose three models for the representation of cluster representative (certain model, uncertain model and heuristic model) to calculate expected squared Euclidean distance between objects and cluster representatives. The experimental results show that our approach (Approximate UK-means) reduces the discrepancies of K-means' clustering results by taking more time than K-means.
In the case of classification on uncertain objects, some existing algorithms are hundreds or thousands times more complex than traditional ones, because an uncertain object is represented by hundreds or thousands of samples. Due to the complex representation of uncertain objects and existing algorithms, it is time consuming to classify uncertain objects. In this thesis, we propose a novel supervised UK-means algorithm to classify uncertain objects more efficiently. In supervised UK-means, we consider to select features that can capture the relevant properties of uncertain data similarly to feature selection on certain objects. We also extend supervised UK-meansto ensemble learning. We experimentally demonstrate that our proposed approaches are more efficient than existing algorithms and can attain comparatively accurate results on non-overlapping data sets. In supervised UK-means, the classes are assumed to be well separated. But the real data are usually distributed arbitrarily and the classes cannot be separated by simple linear boundaries. We propose Supervised UK-means with Multiple Subclasses (SUMS) which considers that the objects in the same class can be further divided into several groups (subclasses) within the class and tries to learn the subclass representatives to classify objects more accurately. Moreover, we propose a Bounded Supervised UK-means with Multiple Subclasses (BSUMS) to avoid over-fitting. From our experiments, Supervised UK-means with Multiple Subclasses (SUMS) and BSUMS perform better than supervised UK-means on synthetic data sets and real data sets.

Files in this item

Files Size Format
b26530673.pdf 1.018Mb PDF
Copyright Undertaking
As a bona fide Library user, I declare that:
  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

     

Quick Search

Browse

More Information