Author: Liu, Qiliang
Title: Scale-driven clustering of geographical point data
Degree: Ph.D.
Year: 2015
Subject: Cluster analysis.
Spatial analysis (Statistics)
Geography -- Statistical methods.
Hong Kong Polytechnic University -- Dissertations
Pages: xv, 194 pages : illustrations (some color) ; 30 cm
Language: English
Abstract: Clustering is a technique for classifying or grouping similar observed data into clusters or categories. It plays a key role in geographical data analysis, e.g. for investigating the distribution of geographical data and observing the characteristics of clusters. Clustering sometimes also serves as an important pre-process for other data analysis techniques. A number of methods have been developed for clustering of geographical; however, two limitations still exist. First, although many researchers have realized that clusters discovered from a geographical dataset are scale-dependent, most existing method only simply confirm whether or not a set of geographical data is a cluster, but not able to detect multi-scale clusters. Second, user-specified threshold is usually used to determine whether a set of geographical data can be identified as a cluster, thus the significance of the discovered clusters discovered cannot be evaluated in an objective way. Therefore, this study aims to tackle these two problems by mimicking the human perception of grouping at multi-scales. A scale-driven strategy is proposed to detect multi-scale statistically significant clusters from the most popular type of geographical data, i.e. geographical point data. Specifically, scale in clustering is first defined by data (sampling) scale and analysis (model) scale. Then, hypothesis testing is developed to construct the relationship between these two kinds of scales, and further used to evaluate the significance of the clusters discovered at continuous analysis scales. Finally, scale is explicitly modeled as parameter of clustering model. Based on the proposed strategy, a specific scale-driven clustering model is developed for each of the four popular forms of geographical point data, i.e., spatial points, spatio-temporal events, spatial points with attributes and spatio-temporal variables. A scale-driven clustering model is developed for the discovery of density-based clusters from spatial points. A statistical method based on the Delaunay triangulation network is developed to achieve adaptive selection of analysis scale. A method based on the Natural Principle and an iterative detection and removal method are proposed to control the data scale. Experiments on both simulated and real-life datasets show that, compared with existing algorithms, only the proposed model is able to detect multi-scale clusters that are consistent with human perception.
In the detection of density-based dynamic clusters from spatio-temporal events, a scale-driven clustering model is proposed. A method based on spatio-temporal classification entropy and stability analysis is developed to identify the optimal analysis scale. Experiments on both simulated and earthquake datasets show that, compared with existing algorithms, the proposed model is not only able to correctly discover clusters with different shapes and densities but also able to reduce the subjectivity for determining user-specified parameters to a minimum. A scale-driven clustering model for detecting connectivity-based clusters formed by spatially contiguous objects with similar attribute values is developed. Clusters at continuous analysis scales are discovered by minimizing the reduction in the degree of homogeneity within clusters. A permutation test is proposed to identify significant clusters obtained at continuous analysis scales. Experiments on both simulated and meteorological datasets show that, the proposed model is not only able to correctly discover clusters consistent with human perception but also able to overcome an inherently difficult problem of exiting hierarchical clustering algorithms, i.e. lack of proper definition of stopping criterion. To detect clusters formed by neighbouring spatio-temporal variables with similar attribute values, a scale-driven clustering model is constructed. A fast permutation test with the help of topological relationship is proposed to identify significant clusters discovered at continuous analysis scales. Experiments on both simulated and temperature datasets show that, compared with existing algorithm, the proposed model is more efficient and effective for detecting significant clusters at continuous analysis scales. In summary, this study aims to detect significant clusters at multiple scales for different applications. To achieve this, a scale-driven strategy is proposed and scale is explicitly represented in the parameterization of clustering models. Experimental results show that multi-scale significant clusters with different sizes, shapes and densities can be easily discovered by controlling the scale parameters, and the subjectivity in clustering of geographical data is significantly reduced.
Access: open access

Files in This Item:
File Description SizeFormat 
b28068853.pdfFor All Users6.89 MBAdobe PDFView/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: