Author: | Turan, Cigdem |
Title: | Facial image analysis and its applications to facial expression recognition |
Advisors: | Lam, Kin-man (EIE) |
Degree: | Ph.D. |
Year: | 2018 |
Subject: | Hong Kong Polytechnic University -- Dissertations Human face recognition (Computer science) Face perception -- Data processing |
Department: | Department of Electronic and Information Engineering |
Pages: | xxii, 157 pages : color illustrations |
Language: | English |
Abstract: | Facial expression recognition (FER), defined as the task to identify someone's emotional or affective state based on face images, has been studied widely in the last few decades. With the development of computer-vision techniques and the availability of better computational power, FER methods can now achieve great performance on the recognition of posed expressions. For instance, the latest recognition rates on the widely used database, the Extended Cohn-Kanade (CK+) database, consisting of posed expressions, have reached over 98% accuracy. However, there are many possible applications of facial behavior analysis, which vary from advertising to teaching, from pain detection to lie detection, and those applications require more complicated recognition methods that can deal with spontaneous expressions and real-life conditions, such as pose, intensity and illumination variations. Therefore, the objectives of this thesis are to research the basic steps of FER, such as classification and face representation, including feature extraction, dimensionality reduction and feature fusion, and to review and develop efficient and robust methods to improve the FER performance. In this thesis, we first present the histogram-based local descriptors applied to FER from static images, and provide a systematic review and analysis of them. We start with introducing a taxonomy for histogram-based local descriptors and highlight the representative examples of the specific steps, while analyzing their strengths and weaknesses. Then, we compare the performance of 27 local descriptors on four popular databases with the same experiment set-up, including the use of two classifiers, different image resolutions, and different numbers of sub-regions. In addition to their accuracy, other important aspects, such as face resolutions for the best performances, are also studied. Moreover, we compare the results achieved by handcrafted features, e.g. histogram-based local features, with the results obtained by feature learning and the state-of-the-art deep features. We also evaluate the robustness of the respective local descriptors in the scenario of a cross-dataset facial expression recognition problem. This part of the thesis aims to bring together different studies of the visual features for FER by evaluating their performances under the same experiment set-up, and critically reviewing various classifiers making use of the local descriptors. Having conducted a review of existing local descriptors with different settings, we propose different methods for FER. In the literature, features extracted from two or more modalities, such as audio, video or image, have been combined for FER, as well as features extracted from different regions of the same face images to enhance FER accuracy. In this thesis, we propose a two-level classification framework with region-based feature fusion. In the first level, the features from the eye and the mouth windows, which are the most salient regions in faces for representing facial expressions, are concatenated to form an augmented feature for facial expression. If the expression of a query input cannot be determined confidently by a Support Vector Machine classifier, the features in the second level of classification are obtained by fusion using Canonical Correlation Analysis (CCA), which can explore and enhance the correlation between the eye and the mouth features, since these two regions should have a high correlation in describing a specific facial expression. There have been many image features or descriptors proposed for FER, which can achieve different recognition rates. Also, different descriptors can achieve different recognition rates for a specific expression class. In this thesis, we propose an emotion-based feature-fusion method, using the Discriminant-Analysis of Canonical Correlations (DCC) with an adaptive descriptor selection algorithm. The adaptive descriptor selection algorithm determines the best two features for each expression class on a given training set followed by the fusion of these two features so as to achieve a higher recognition rate for each expression. Our aim is to find the best discriminant features by combining the different descriptors for recognizing each facial expression. To the best of our knowledge, we are the first to use different coherent descriptors for the recognition of different expressions. Dimensionality reduction is a fundamental problem in any classification problem, since many real-world computer-vision and pattern-recognition applications are involved with large volumes of high-dimensional data. In this thesis, we propose a new and more efficient manifold-learning method, named Soft Locality Preserving Map (SLPM). SLPM is a graph-based subspace-learning method, with the use of k-neighborhood information and the class information. The key feature of SLPM is that it aims to control the level of spread of the different classes, because the spread of the classes in the underlying manifold is closely connected to the generalizability of the learned subspace. We also propose an efficient way to further enhance the generalizability of the manifolds of the different expression classes by feature generation, so as to represent each expression manifold completely. The automatic recognition and interpretation of facial expressions has much to offer to various disciplines, such as psychology, cognitive science and neuroscience. These possible applications have motivated us to extend our research scope to interdisciplinary studies. To do so, a behavioural experiment is designed to study the multidimensionality of comprehension shown by facial expressions, which brings together studies, results and questions from different disciplines by focusing on the computational analysis of human behavior. A new multimodal facial expression database, named Facial Expressions of Comprehension (FEC), consisting of the videos recorded during the behavioral experiments, is created and released for further academic research. The multidimensionality of comprehension is analyzed in two aspects: 1) the level of comprehension shown by expressions, and 2) the level of engagement with the corresponding feedback. We also propose a new methodology that aims to explore the changes in facial configuration caused by an event, namely Event-Related Intensities (ERIs). All the methods proposed in this thesis have been evaluated and compared to the state-of-the-art methods. Experimental results and comprehensive analyses show that our algorithms and frameworks can achieve convincing and consistent performances. |
Rights: | All rights reserved |
Access: | open access |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
991022173537203411.pdf | For All Users | 3.67 MB | Adobe PDF | View/Open |
Copyright Undertaking
As a bona fide Library user, I declare that:
- I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
- I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
- I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.
Please use this identifier to cite or link to this item:
https://theses.lib.polyu.edu.hk/handle/200/9757