Author: Zhang, Anshu
Title: Uncertainty-based spatial association rule mining
Advisors: Shi, Wenzhong (LSGI)
Degree: Ph.D.
Year: 2017
Subject: Hong Kong Polytechnic University -- Dissertations
Geographic information systems
Geospatial data
Data mining
Department: Department of Land Surveying and Geo-Informatics
Pages: x, 155 pages : illustrations
Language: English
Abstract: Spatial association rule mining (SARM) is the discovery of implicit 'antecedent --> consequence' rules from spatial databases. SARM is an emerging topic in geographical information science (GISc) and a powerful tool in research and practice. The key to usefulness of SARM results is their reliability: the abundance of authentic rules, control over the risk of spurious rules, and goodness of rule interestingness measure (RIM) values. Such reliability, however, faces great challenges from uncertainties of various types and sources. Uncertainty-based SARM, proposed in this thesis, aims at enhancing the reliability of SARM results on all three aforesaid aspects via novel and improved uncertainty handling methods. In response to three critical uncertainty issues in SARM: data error, gradual/vague spatial concept, and uncertain concept modelling, this thesis realises the following three interrelated objectives: Mining significant spatial association rules from uncertain data: a new statistical test on the rules is developed to correct existing statistically sound test, which is indispensable for strict control over spurious rules, for distortions due to data error. The new test combines original data error propagation modelling as well as simulative processes. The new method can averagely compensate 50% loss of true rules due to data error, thus markedly enrich authentic results. Such efficacy is also largely robust to inaccurate data error information and dependent error probabilities in practical imperfect data. Gaussian-curve-based fuzzy data discretization and crisp-fuzzy SARM: a Gaussian-curve-based model is presented to strengthen spatial semantics in fuzzy data discretization. Also, crisp-fuzzy SARM is originated to synthesise statistically sound testing based on ordinary (crisp) SARM, and RIM evaluation based on fuzzy rules. The techniques can discover at least twice as many authentic rules as conventional fuzzy SARM; avoid large overestimations of RIM values, usually by more than 50%, in ordinary SARM; and keep minimal risk of spurious rules.
Genetic algorithm (GA) for crisp-fuzzy SARM: the new GA integrates the merits of statistical evaluation, new Gaussian-curve-based data discretization and crisp-fuzzy SARM. Experimentwise and generationwise adjusted statistical tests are innovated for the GA to satisfy different user needs. The proposed GA can produce several times as many rules, and as high RIM values as non-GA SARM. The risk of spurious rules is below low user specified levels for both testing approaches. The developments for the three objectives are proven effective and robust, through synthetic and real-world data experiments of various experimental settings and data conditions. Case studies for these developments on urbanization-socioeconomic changes, wildfire risks, and hotel room price determinants inject new findings in corresponding research topics. In sum, methods developed in this thesis can alleviate manifold uncertainty issues in SARM, thereby significantly improving the reliability of SARM results in all its three aforesaid aspects. As a systematic study on uncertainty handling in SARM, this thesis would enrich GISc theories and methodologies. Particularly, it answers the increasingly pressing need for quality and reliability studies in GISc. The thesis work is also practically useful in improving decision making and user services in various domains involving spatial data, as exemplified by the case studies.
Rights: All rights reserved
Access: open access

Files in This Item:
File Description SizeFormat 
991021952840003411.pdfFor All Users2.08 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/9033