|Title:||Learning discriminative models and representations for visual recognition|
|Advisors:||Zhang, LeI (COMP)|
|Subject:||Hong Kong Polytechnic University -- Dissertations|
|Department:||Department of Computing|
|Pages:||xviii, 131 pages : color illustrations|
|Abstract:||In the past decade, visual recognition systems have witnessed major advances that led to record performances on challenging datasets. However, designing effective recognition algorithms that exhibit robustness to the sizeable extrinsic variability of visual data, particularly when the available training data are insufficient to learn accurate models, is a signifcant challenge. In this thesis, we focus on designing effective models and representations for visual recognition, via exploiting the characteristics of visual data and vision problems and taking advantages of classic sparse models and state-of-the-art deep neural networks. The first part of this thesis is dedicated to providing a probabilistic interpretation for general sparse/collaborative representation based classifcation. With a series of probabilistic modelling for sample-to-sample and sample-to-subspace, we present a probabilistic collaborative representation based classifer (ProCRC) that not only reveals the inner relationship between the coding and classifcation stages in original framework, but also achieves superior performance on a variety of challenging visual datasets when coupled with the convolutional neural network (CNN) features. We then facilitate the inherent difficulties in detecting parts and estimating appearance for fine-grained visual categorization (FGVC) problem, we consider the semantic properties of CNN activations and propose an end-to-end architecture based on kernel learning scheme to capture the higher-order statistics of convolutional activations for modelling part interaction. The proposed approach yields more discriminative representation and achieves competitive results on the widely used FGVC datasets even without part annotation. We also consider weakly-supervised learning of web videos to alleviate the data scarcity issue for video summarization. This is motivated by the fact that the publicly available datasets for video summarization remain limited in size and diversity, making most supervised approaches difficult in learning reliable summarization models. We investigate a generative summarization model via extending the variational autoencoder framework to accept both the benchmark videos and a large number of web videos. A variational encoder-summarizer-decoder (VESD) is proposed to identify the important segments of raw video using attention mechanism and semantic matching with web video. In this way, our VESD provides a practical solution for real-world video summarization. We further incorporate sparse models into deep architectures as structured modelling in learning powerful representations from datasets of limited size. The proposed DCSR-Net transforms a discriminative centralized sparse representation (DCSR) model into a learnable feed-forward network which can automatically impose the discriminative structure in data representations. Experiments indicate that DCSR-Net can be regarded as a general and effective module in learning structured representations.|
|Rights:||All rights reserved|
Files in This Item:
|991022174659803411.pdf||For All Users||2.3 MB||Adobe PDF||View/Open|
As a bona fide Library user, I declare that:
- I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
- I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
- I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.
Please use this identifier to cite or link to this item: