Visual learning of pairwise similarity and relative order relationships

Wang, Faqiang

Author:	Wang, Faqiang
Title:	Visual learning of pairwise similarity and relative order relationships
Advisors:	Zhang, Lei (COMP) Zhang, David (COMP) Zuo, Wangmeng (COMP)
Degree:	Ph.D.
Year:	2018
Subject:	Hong Kong Polytechnic University -- Dissertations Machine learning Computer vision
Department:	Department of Computing
Pages:	xxii, 168 pages : color illustrations
Language:	English
Abstract:	Many computer vision problems can be viewed as image pairwise relationship learning tasks. They learn a model to predict whether a given image pair belongs to a particular pairwise relationship. Among the existing image pairwise relationships, the similarity relationship and relative order relationship are the two most common pairwise relationships in the computer vision tasks. The similarity learning methods aim to learn a proper similarity measure, with which the similarity between images can be more effectively evaluated for classification. It is widely applied in the computer vision applications such as face verification, person re-identification, etc. Different from similarity, the relative order is a kind of antisymmetric relationship. The goal of relative order relationship learning is to learn a prediction model to predict the relative order relationship between two images. It is applied into the ranking task, e.g. relative attributes, and the regression task, e.g. age estimation and camera pose estimation. Although the similarity and relative order relationships learning has been widely and successfully applied into many computer vision tasks, there are still some issues to be further studied. The similarity learning can be divided by two categories, i.e. Mahalanobis distance metric learning and deep similarity learning. For the Mahalanobis distance metric learning methods, it is important to investigate the connections between metric learning and kernel classification and explore how to utilize the kernel classification resources in the research and development of new metric learning methods. It's thus interesting to investigate whether we can unify the similarity learning methods into a general framework, which can provide a good platform for developing new similarity learning algorithms. As the single image representation (SIR) and pairwise image representation (PIR) are commonly utilized in deep learning methods, it's necessary to design a similarity function by fusing the SIR and PIR to exploit their advantages. For the relative order relationship learning methods, how to learn the relative order relationship to improve the performances of both ranking and regression methods is a crucial issue. As in some applications, there are multiple relative order relationship to be learned, it's also important to build a network architecture for better tradeoff between the variances and connections of different relative order relationships. In this thesis, we aim to develop the distance metric learning, deep similarity learning, single relative order relationship learning and multiple relative order relationship learning models for image pairs. In Chapter 2, we generalize several state-of-the-art metric learning methods, such as large margin nearest neighbor (LMNN) and information theoretic metric learning (ITML), into a kernel classification framework. First, doublets and triplets are constructed from the training samples, and a family of degree-2 polynomial kernel functions are proposed for pairs of doublets or triplets. Then, a kernel classifica-tion framework is established to generalize many popular metric learning methods such as LMNN and ITML. The proposed framework can also suggest new metric learning methods, which can be efficiently implemented, interestingly, by using the standard support vector machine (SVM) solvers. Two novel metric learning methods, namely doublet-SVM and triplet-SVM, are then developed under the proposed framework. Experimental results show that doublet-SVM and triplet-SVM achieve competitive classification accuracies with state-of-the-art metric learning methods but with significantly less training time. In Chapter 3, we formulate metric learning as a kernel classification problem with the positive semidefinite constraint, and solve it by iterated training of SVMs. The new formulation is easy to implement and efficient in training with the off-the-shelf SVM solvers. Two novel metric learning models, namely Positive-semidefinite Constrained Metric Learning (PCML) and Nonnegative-coefficient Constrained Metric Learning (NCML), are developed. Both PCML and NCML can guarantee the global optimality of their solutions. Experiments are conducted on handwritten digit classification, face verification and person re-identification to evaluate our methods. Compared with the state-of-the-art approaches, our methods can achieve comparable classification accuracy and are efficient in training. In Chapter 4, we analyze the connection between the SIR and PIR based approaches, and propose a novel similarity measure by fusing SIR and PIR to exploit their advantages and boost the matching performance. A convolutional neural network (CNN) based similarity learning approach is proposed to jointly learn the SIR and PIR to optimize the proposed similarity measure. Our CNN is composed of a sub-network shared by SIR and PIR, and followed by two concurrent sub-networks to extract the SIRs of given images and the PIRs of given image pairs, respectively. To reduce the computational cost, we adopt a shallow PIR sub-network which consists of only one convolutional layer, one pooling layer and one fully-connected layer. Therefore, both SIR and PIR can be jointly learned for pursuing better matching accuracy with moderate computational cost. Furthermore, the matching scores learned with pairwise comparison and triplet comparison objectives can be combined to improve the matching performance. Experiments on the CUHK03, CUHK01 and VIPeR datasets show that the proposed method can achieve favorable accuracy with modest training time. In Chapter 5, we study to extend the deep siamese network from similarity learning to relative order relationship learning. We formulate the second-order image representation and the relative order relationship prediction function. Then we propose an extended deep siamese CNN based method with relative order loss, mean square error (MSE) loss and softmax loss to learn the relative order relationship. Furthermore, we find that the proposed method can also be applied to the regression task, e.g. age estimation, although it is not aimed at predicting pairwise relationship. We conduct the experiments on relative attribute ranking and age estimation tasks. The results show that the proposed method achieves the state-of-the-art performance, and outperforms the competing methods. In Chapter 6, we study the multiple relative order relationship learning problem for the camera pose estimation task. We consider the this task as an Multi-Task Learning (MTL) problem, in which the learning of each pose component is regarded as a learning task, and we propose a camera pose estimation method based on deep siamese networks. In our proposed method, we use the second-order representation of images to learn the relative order relationship, and adopt the relative order loss and mean square error (MSE) loss to make the predicted poses and their relative order to be consistent with the ground-truth. To jointly learn multiple relative order relationships of the camera pose, we propose a deep siamese network which consists of two shared branches. Each branch consists of the spatial sub-network and regression sub-network, which learn the spatial feature and the regressors, respectively. The spatial sub-network is shared across all the learning tasks, and it can capture the generality between different pose components. As the regressors of the pose components are different, the regression sub-network of different pose components are separated. So it can capture the specificity of each pose component. The experimental results show that our proposed method has lower prediction error than PoseNet [67] and the nearest neighbor approaches. To sum up, we developed a kernel classification learning framework for metric learning, and proposed a series of distance metric learning models, i.e. doublet-SVM, triplet-SVM, PCML and NCML, based on the framework. We also proposed a new similarity measure by fusing the SIR and PIR, and build a CNN to jointly learn these representations and the similarity measure. On the basis of the deep siamese network, we proposed a single relative order relationship learning model and applied it into the ranking and regression tasks. For the camera pose estimation task, we extended the single relative order relationship learning model into multiple relative order relationship learning, and developed a CNN to model the variances and connections between different relationships. In the future, we will study the new image pairwise relationship indicators and the new learning models. We will also investigate the new applications of similarity and relative order relationships learning.
Rights:	All rights reserved
Access:	open access

Files in This Item:

File	Description	Size	Format
991022144643203411.pdf	For All Users	832.71 kB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/9560