Similarity measures : algorithms and applications

Chan, Tsz Nam

Author:	Chan, Tsz Nam
Title:	Similarity measures : algorithms and applications
Advisors:	Yiu, Ken (COMP)
Degree:	Ph.D.
Year:	2019
Subject:	Hong Kong Polytechnic University -- Dissertations Image processing -- Digital techniques Computer algorithms Image analysis -- Data processing
Department:	Department of Computing
Pages:	xx, 177 pages : color illustrations
Language:	English
Abstract:	Similarity measures are the basic components for various problems such as image processing, computer vision, pattern recognition and machine learning problems. However, evaluating the similarity measures is normally the bottleneck for many applications. In this thesis, we highlight three computational intensive applications and propose efficient algorithms in these scenarios. The first application is object detection in images. Given a query image, this problem finds the most similar sub-image within a given target image. The problem can be formulated as the nearest neighbor search problem. In the context of computer vision, we also call this the template matching problem. The Euclidean distance is used to measure the dissimilarity between the query image and a sub-image. However, the time complexity of object detection for each query is the product of the sizes of sub-image and image, which is prohibited for fast object detection scenario. We propose two solutions which can significantly outperform the state-of-the-art method by 9-20 times faster. The second application is image retrieval. Existing image retrieval systems extract the feature histograms for all images. During the online phase, image retrieval systems return the k most similar images for each online image-query from the user. One robust similarity measure between two histograms is based on the Earth Mover's Distance (EMD). However, due to the cubic time complexity for evaluating EMD, it restricts the applicability to small-scale datasets. We present the approximation framework that leverages on lower and upper bound functions to compute approximate EMD with error guarantee. Under this framework, we present two solutions which can significantly outperform the existing exact or heuristic solutions. Our experimental studies demonstrate that our best solution can outperform the existing method by 2.38x to 7.26x times faster. The third application is (kernel) classification. In machine learning context, kernel function is the similarity measure between two multidimensional vectors, which are extracted by different feature extraction methods, based on different scenarios. Many machine learning models need to compute the weighted aggregation of kernel function values with respect to a set of multidimensional vectors and the query vector, using different types of kernel functions, for example: Gaussian, Polynomial or Sigmoid kernels. However, computing the online kernel aggregation function is normally expensive which limits its applicability for some real-time (e.g. network anomaly detection) or large-scale (e.g. density estimation/ classification for physical modeling) applications. We propose novel and effective bounding techniques to speed up the computation of kernel aggregation. We further boost the efficiency by leveraging index structures and exploiting index tuning opportunities. Experimental studies on many real datasets reveal that our proposed method achieves speedups of 2.5-738x over the state-of-the-art.
Rights:	All rights reserved
Access:	open access

Files in This Item:

File	Description	Size	Format
991022197537803411.pdf	For All Users	2.38 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/9915