Author: Jiang, Yalong
Title: Deep learning models for human parsing and action recognition : architectural design, model compression and data augmentation
Advisors: Chi, Zheru (EIE)
Degree: Ph.D.
Year: 2020
Subject: Hong Kong Polytechnic University -- Dissertations
Pattern recognition systems
Human activity recognition
Machine learning
Department: Department of Electronic and Information Engineering
Pages: xxvii, 209 pages : color illustrations
Language: English
Abstract: The methods for human parsing and action recognition have long been critical techniques in visually describing human behaviours. The recent developments in Convolutional Neural Networks (CNNs) have brought significant improvements to the tasks thanks to the availability of an increased amount of training data. In this study, I focus on three major problems which hinder the applications of deep learning models to human parsing and action recognition. Firstly, existing human parsing models suffer from incomplete feature representations which may lead to failures in some difficult cases. I propose two novel architectures with comprehensive feature representations to improve the robustness of models. The first architecture explores the relationship between human parsing and pose estimation. A module for pose estimation is integrated with a human parsing module to improve the performance under complex backgrounds and variances in human's poses. The second architecture adopts a CNN module for depth estimation which pre-processes input images for the segmentation module. It can improve the pixel classification near boundaries. The availability of abundant labelled data in pose estimation and depth estimation boosts the performance in human parsing. Secondly, the inappropriate capacity of a CNN model and insufficient training data both contribute to the failures in perceiving semantic information of detailed regions. A high-capacity model cannot generalize to the variations in human parsing and action recognition. In my work, three novel methods to reduce the complexity of convolutional layers are proposed. The first method applies orthogonal weight normalization for weight initialization. Performance is improved with complexity reduced. The second method adjusts the dependency among convolutional kernels by conducting principal component analysis on the kernels. The third method clusters the convolutional kernels in each layer based on the Euclidean distance and evaluates the contributions from different clusters by examining the changes in training and test accuracy upon removing the clusters. Higher computational efficiency and better performance can be achieved at the same time. This method can be applied to the models which are pretrained on other tasks. Besides model compression, I further propose a method to evaluate the complexity of a human parsing task. The variances in scales, locations and the consistency in predictions from different models are studied. Additionally, a layer-wise training scheme is proposed to better explore the potential of a CNN model. Thirdly, human parsing models are used for improving the robustness of action recognition models. I extend human parsing models to predict the correspondences between RGB images and the surface-based representations of human bodies. The predictions are used for determining the task-irrelevant content in inputs which increases the domain discrepancy. The proposed scheme reduces the discrepancy between the training data and the test data and improves the performance in action recognition. The above-mentioned methods are evaluated on the Pascal Person Part dataset and the Look into Person dataset for human parsing, the COCO dataset for pose estimation, the MegaDepth dataset for depth estimation, and the HMDB-51 dataset for action recognition.
Rights: All rights reserved
Access: open access

Files in This Item:
File Description SizeFormat 
991022347054103411.pdfFor All Users6.16 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/10318