Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor | Department of Electronic and Information Engineering | en_US |
dc.contributor.advisor | Chi, Zheru (EIE) | - |
dc.creator | Jiang, Yalong | - |
dc.identifier.uri | https://theses.lib.polyu.edu.hk/handle/200/10318 | - |
dc.language | English | en_US |
dc.publisher | Hong Kong Polytechnic University | - |
dc.rights | All rights reserved | en_US |
dc.title | Deep learning models for human parsing and action recognition : architectural design, model compression and data augmentation | en_US |
dcterms.abstract | The methods for human parsing and action recognition have long been critical techniques in visually describing human behaviours. The recent developments in Convolutional Neural Networks (CNNs) have brought significant improvements to the tasks thanks to the availability of an increased amount of training data. In this study, I focus on three major problems which hinder the applications of deep learning models to human parsing and action recognition. Firstly, existing human parsing models suffer from incomplete feature representations which may lead to failures in some difficult cases. I propose two novel architectures with comprehensive feature representations to improve the robustness of models. The first architecture explores the relationship between human parsing and pose estimation. A module for pose estimation is integrated with a human parsing module to improve the performance under complex backgrounds and variances in human's poses. The second architecture adopts a CNN module for depth estimation which pre-processes input images for the segmentation module. It can improve the pixel classification near boundaries. The availability of abundant labelled data in pose estimation and depth estimation boosts the performance in human parsing. Secondly, the inappropriate capacity of a CNN model and insufficient training data both contribute to the failures in perceiving semantic information of detailed regions. A high-capacity model cannot generalize to the variations in human parsing and action recognition. In my work, three novel methods to reduce the complexity of convolutional layers are proposed. The first method applies orthogonal weight normalization for weight initialization. Performance is improved with complexity reduced. The second method adjusts the dependency among convolutional kernels by conducting principal component analysis on the kernels. The third method clusters the convolutional kernels in each layer based on the Euclidean distance and evaluates the contributions from different clusters by examining the changes in training and test accuracy upon removing the clusters. Higher computational efficiency and better performance can be achieved at the same time. This method can be applied to the models which are pretrained on other tasks. Besides model compression, I further propose a method to evaluate the complexity of a human parsing task. The variances in scales, locations and the consistency in predictions from different models are studied. Additionally, a layer-wise training scheme is proposed to better explore the potential of a CNN model. Thirdly, human parsing models are used for improving the robustness of action recognition models. I extend human parsing models to predict the correspondences between RGB images and the surface-based representations of human bodies. The predictions are used for determining the task-irrelevant content in inputs which increases the domain discrepancy. The proposed scheme reduces the discrepancy between the training data and the test data and improves the performance in action recognition. The above-mentioned methods are evaluated on the Pascal Person Part dataset and the Look into Person dataset for human parsing, the COCO dataset for pose estimation, the MegaDepth dataset for depth estimation, and the HMDB-51 dataset for action recognition. | en_US |
dcterms.extent | xxvii, 209 pages : color illustrations | en_US |
dcterms.isPartOf | PolyU Electronic Theses | en_US |
dcterms.issued | 2020 | en_US |
dcterms.educationalLevel | Ph.D. | en_US |
dcterms.educationalLevel | All Doctorate | en_US |
dcterms.LCSH | Hong Kong Polytechnic University -- Dissertations | en_US |
dcterms.LCSH | Pattern recognition systems | en_US |
dcterms.LCSH | Human activity recognition | en_US |
dcterms.LCSH | Machine learning | en_US |
dcterms.accessRights | open access | en_US |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
991022347054103411.pdf | For All Users | 6.16 MB | Adobe PDF | View/Open |
Copyright Undertaking
As a bona fide Library user, I declare that:
- I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
- I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
- I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.
Please use this identifier to cite or link to this item:
https://theses.lib.polyu.edu.hk/handle/200/10318