Deep learning models for human parsing and action recognition : architectural design, model compression and data augmentation

Jiang, Yalong

Full metadata record

DC Field	Value	Language
dc.contributor	Department of Electronic and Information Engineering	en_US
dc.contributor.advisor	Chi, Zheru (EIE)	-
dc.creator	Jiang, Yalong	-
dc.identifier.uri	https://theses.lib.polyu.edu.hk/handle/200/10318	-
dc.language	English	en_US
dc.publisher	Hong Kong Polytechnic University	-
dc.rights	All rights reserved	en_US
dc.title	Deep learning models for human parsing and action recognition : architectural design, model compression and data augmentation	en_US
dcterms.abstract	The methods for human parsing and action recognition have long been critical techniques in visually describing human behaviours. The recent developments in Convolutional Neural Networks (CNNs) have brought significant improvements to the tasks thanks to the availability of an increased amount of training data. In this study, I focus on three major problems which hinder the applications of deep learning models to human parsing and action recognition. Firstly, existing human parsing models suffer from incomplete feature representations which may lead to failures in some difficult cases. I propose two novel architectures with comprehensive feature representations to improve the robustness of models. The first architecture explores the relationship between human parsing and pose estimation. A module for pose estimation is integrated with a human parsing module to improve the performance under complex backgrounds and variances in human's poses. The second architecture adopts a CNN module for depth estimation which pre-processes input images for the segmentation module. It can improve the pixel classification near boundaries. The availability of abundant labelled data in pose estimation and depth estimation boosts the performance in human parsing. Secondly, the inappropriate capacity of a CNN model and insufficient training data both contribute to the failures in perceiving semantic information of detailed regions. A high-capacity model cannot generalize to the variations in human parsing and action recognition. In my work, three novel methods to reduce the complexity of convolutional layers are proposed. The first method applies orthogonal weight normalization for weight initialization. Performance is improved with complexity reduced. The second method adjusts the dependency among convolutional kernels by conducting principal component analysis on the kernels. The third method clusters the convolutional kernels in each layer based on the Euclidean distance and evaluates the contributions from different clusters by examining the changes in training and test accuracy upon removing the clusters. Higher computational efficiency and better performance can be achieved at the same time. This method can be applied to the models which are pretrained on other tasks. Besides model compression, I further propose a method to evaluate the complexity of a human parsing task. The variances in scales, locations and the consistency in predictions from different models are studied. Additionally, a layer-wise training scheme is proposed to better explore the potential of a CNN model. Thirdly, human parsing models are used for improving the robustness of action recognition models. I extend human parsing models to predict the correspondences between RGB images and the surface-based representations of human bodies. The predictions are used for determining the task-irrelevant content in inputs which increases the domain discrepancy. The proposed scheme reduces the discrepancy between the training data and the test data and improves the performance in action recognition. The above-mentioned methods are evaluated on the Pascal Person Part dataset and the Look into Person dataset for human parsing, the COCO dataset for pose estimation, the MegaDepth dataset for depth estimation, and the HMDB-51 dataset for action recognition.	en_US
dcterms.extent	xxvii, 209 pages : color illustrations	en_US
dcterms.isPartOf	PolyU Electronic Theses	en_US
dcterms.issued	2020	en_US
dcterms.educationalLevel	Ph.D.	en_US
dcterms.educationalLevel	All Doctorate	en_US
dcterms.LCSH	Hong Kong Polytechnic University -- Dissertations	en_US
dcterms.LCSH	Pattern recognition systems	en_US
dcterms.LCSH	Human activity recognition	en_US
dcterms.LCSH	Machine learning	en_US
dcterms.accessRights	open access	en_US

Files in This Item:

File	Description	Size	Format
991022347054103411.pdf	For All Users	6.16 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/10318