|Facial affect recognition : from feature engineering to deep learning
|Chi, Zheru (EIE)
|Hong Kong Polytechnic University -- Dissertations
Human face recognition (Computer science)
Pattern recognition systems
|Department of Electronic and Information Engineering
|xxii, 152 pages : color illustrations
|Facial expression recognition has been a long standing problem and attracted growing interest from the affective computing community. This thesis presents the research I conducted for facial affect recognition with novel hand-crafted features and deep learning. Three main contributions are reported in this thesis. They include: (1) an effective approach with novel features for facial expression recognition in video; (2) a framework with multiple tasks for detecting and locating pain events in video; and (3) an effective method with a deep convolutional neural network for smile detection in the wild. In the first investigation, I propose novel features and an application of multi-kernel learning to combine multiple features for facial expression recognition in video. A new feature descriptor called Histogram of Oriented Gradients from Three Orthogonal Planes (HOG-TOP) is proposed to characterize facial appearance changes. A new effective geometric feature is also proposed to capture facial configuration changes. The role of audio modality on affect recognition is also explored. Multiple feature fusion is used to combine different features optimally. Experimental results show that our approach is robust in dealing with video-based facial expression recognition problems under lab-controlled environment and in the wild compared with the other state-of-the-art methods.In the second investigation, I propose an effective framework with multiple tasks for pain event detection and locating. Histogram of Oriented Gradients (HOG) of fiducial points (P-HOG) and HOG-TOP are used to characterize spatial features and dynamic textures from video frames and video segments. Both frame-level and segment-level detections are based on trained Support Vector Machines (SVMs). Max pooling strategy is further used to obtain the global P-HOG and global HOG-TOP, and an SVM with multiple kernels is trained for pain event detection. Finally, an effective probabilistic fusion method is proposed to integrate the three different tasks (frame, segment and sequence) to locate pain events in video. Experimental results show that the proposed method outperforms other state-of-the-art methods both in pain event detection and pain event locating in video. In the third investigation, I propose an effective approach for smile detection in the wild with deep learning. Deep learning can effectively combine feature learning and classification into a single model. In this study, a deep convolutional network called Smile-CNN is used to perform feature learning and smile detection simultaneously. I also discuss the discriminative power of the learned features from the Smile-CNN model. By feeding the learned features to train an SVM or AdaBoost classifier, I show that the learned features have impressive discriminative power. Experimental results show that the proposed approach can achieve a promising performance in smile detection.
|All rights reserved
Files in This Item:
|For All Users
As a bona fide Library user, I declare that:
- I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
- I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
- I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.
Please use this identifier to cite or link to this item: