Deep hierarchical architectures for saliency prediction and salient object detection

Hu, Yu

Full metadata record

DC Field	Value	Language
dc.contributor	Department of Electronic and Information Engineering	en_US
dc.contributor.advisor	Chi, Zheru (EIE)	-
dc.creator	Hu, Yu	-
dc.identifier.uri	https://theses.lib.polyu.edu.hk/handle/200/8757	-
dc.language	English	en_US
dc.publisher	Hong Kong Polytechnic University	-
dc.rights	All rights reserved	en_US
dc.title	Deep hierarchical architectures for saliency prediction and salient object detection	en_US
dcterms.abstract	This thesis presents hierarchical deep architectures for saliency prediction and object detection in natural scenes. Three major contributions are reported in the thesis: (1) a deep architecture based on a Convolutional Neural Network (CNN) for saliency prediction, (2) a hybrid pixel-based and segmentation-based approach for salient object detection, and (3) learning the heat maps of human eye gaze data using a CNN-based model. In the first investigation, an Adaptive Saliency Model (ASM) based on CNN for saliency prediction is proposed. The model consists of convolutional layers and subsampling layers and a Two-Dimensional Output (TDO) layer which performs the prediction of image saliency. The kernels in the CNN perform feature extraction and the TDO aims to generate a saliency map. Two levels of kernels are utilized in two convolutional layers in ASM. The first-level kernels are used to learn low-level features from an input image while the second-level kernels aim at capturing high-level features. In this thesis, an approach of training ASM to generate a saliency map with only the original image as the input is studied. In my study, I explore the Long-Term-Dependency (LTD) problem during training ASM using a Gradient Descent (GD) Back-Propagation (BP) algorithm. Resilient Propagation (Rprop) has shown a superior performance in training ASM. Various aspects including the saliency prediction performance, the training time and the number of kernels used in convolutional layers are systematically studied. Experimental results on the Object and Semantic Images and Eye-tracking (OSIE) dataset demonstrate the effectiveness of my proposed ASM compared with state-of-the-art algorithms for saliency prediction based on the Histogram Intersection (HI) metric.	en_US
dcterms.abstract	In the second investigation, I propose a hybrid Salient Object Detection (SOD) model that consists of the modified ASM and the potential Region-Of-Interest (p-ROI) approximation. Different from the ASM used in first investigation in which the ground truth of continuous saliency values is required to train the model, the ASM used in this investigation needs the binary ground truth only to detect salient objects. Specifically, the ASM aims to assign pixels in the input image with saliency values and p-ROI is used to validate the saliency region with a segmentation approach. Both ASM and PROI contribute to the improvement of object detection performance. ASM is used to refine performance of p-ROI by targeting at details, while p-ROI is to enhance the capability of ASM by exploring on the entire input image. The metrics including precision and recall curve and Area Under Curve (AUC) are adopted to evaluate the performance of my approach of SOD. Experimental results on a dataset with manually demarcated ground truth demonstrate a superior performance of the hybrid SOD model comparing with each individual method. In the third investigation, ASM is utilized to learn the heat maps of human eye gaze data. I first employ ASM with the Rprop algorithm to generate heat maps and show that the deep learning method can only achieve a moderate performance. Then I modify the approach to have the deep neural network pre-trained on Itti saliency maps and show that this pre-training process can slightly improve the performance. The metrics including precision and recall curve, Receiver Operating Characteristic (ROC) and AUC are adopted to evaluate the performance of my leaning model on both the OSIE dataset and the CAT2000 dataset.	en_US
dcterms.extent	xix, 134 pages : color illustrations	en_US
dcterms.isPartOf	PolyU Electronic Theses	en_US
dcterms.issued	2016	en_US
dcterms.educationalLevel	All Master	en_US
dcterms.educationalLevel	M.Phil.	en_US
dcterms.LCSH	Image processing -- Digital techniques.	en_US
dcterms.LCSH	Image analysis.	en_US
dcterms.LCSH	Hong Kong Polytechnic University -- Dissertations	en_US
dcterms.accessRights	open access	en_US

Files in This Item:

File	Description	Size	Format
b29256070.pdf	For All Users	4.37 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/8757