Author: Kwok, Cho Ki
Title: Understanding user engagement level during tasks via facial responses, eye gaze and mouse movements
Advisors: Ngai, Grace (COMP)
Degree: M.Phil.
Year: 2018
Subject: Hong Kong Polytechnic University -- Dissertations
Human-computer interaction -- Measurement
Human-computer interaction -- Research
Department: Department of Computing
Pages: xiii, 107 pages : color illustrations
Language: English
Abstract: User engagement refers to the quality of the user experience (UX) on a particular task or interface. It emphasizes the positive aspects of human and computer interaction, and the desire to work on the same task longer and repeatedly [10]. Users spend time, emotion, attention and effort when they interact with technologies, and a successful application or task should be able to engage users, instead of simply being a "job" that needs to be completed. User engagement is therefore a complex phenomenon that encompasses three different dimensions: (1) cognitive engagement, (2) emotional engagement and (3) behavioral engagement. Researchers use different ways to measure user engagement level, such as self-reporting (e.g. questionnaires), observations (e.g. speech analysis, facial expression analysis) and web analytics (e.g. click-through rate, number of site visits, time spent). Nowadays, computers are equipped with high computational power and different kinds of sensors, which make possible automated human affect and mental state detection in a variety of situations. Using computers to "observe" human behaviors and using the observed information to detect levels of engagement could be useful in many situations, such as getting feedback for interface improvement or assuring quality of work generated by online workers (crowdsourcing) or students (e-learning). Therefore, there has been much previous work in detecting user engagement through various means such as facial expression, mouse movement or gaze movement. However, this work is hampered by three main challenges: (1) the constraints caused by using intrusive devices, (2) limitations of specific tasks (like gaming) which may produce user behavior different from daily computer usage, (3) and incomprehensive ground truth as collected by straightforward and direct survey questionnaires that capture users' self-reported numeric level of engagement, which may not cover the three dimensions of engagement. The work presented in this thesis focuses on non-intrusive visual cues, in particular, visual cues from facial expressions, eye gaze, and mouse cursor signals, for understanding users' level of engagement in human-computer interaction task. Addressing the first two limitations mentioned above, we conducted experiments and studied users' facial responses, eye gaze and mouse behaviors related to the change of engagement level during doing Language Learning tasks and Web Searching tasks. Non-intrusive devices, such as the mouse, Tobii eye tracker and off-the-shelf webcam, are used to capture users' behaviors in the experiment. By using Pearson's Correlation, Paired T-Test and single factor one way ANOVA, we select a useful feature set from the initial feature set. From the investigation, we have a better understanding of the relationship between engagement level and user behavior. For example, the facial action unit 5 ("upper lid raiser") is useful in engagement detection. We observed that this feature is indicative as sleepy users try to keep their eyes open to avoid falling asleep.
To address the third constraints, we collected an engagement dataset that includes a multi-dimension measurement of ground truth. It includes the User Engagement Scale (UES) [89], which covers the three dimensions of user engagement, as the self-reporting tool and the average UES scores can reliably represent the engagement level. It also includes the commonly-used NASA Task Load Index (NASA-TLX) annotations for measuring the cognitive work load. We include a further investigation into the correlation between the UES and TLX sub-scale scores. We analyze facial affect in two ways. First, we measure momentary affect through the facial action units in every frame of the facial response videos. We then move to an overall affect measurement through segment-based facial features to seek more representative features that cover the whole task period. The facial affect recognition model was extended into a real life application to identify video viewers' emotion. We developed an asynchronous video-sharing platform with Emotars, which allow users to share their affects and experience with others without disclosing their real facial expressions and/or features. We analyze the user experience of using this platform in four different dimensions, including emotion awareness, engagement, comfortableness and relationship. For eye gaze and mouse interaction, we make use of non-intrusive devices, i.e. mouse, Tobii eye tracker and off-the-shelf webcam, to collect eye and mouse interaction data. We investigated using mouse features for user intention prediction, or, in other words, predicting the next type of mouse interaction event. Results show that the mouse interaction features are representative of users' behavior. Finally, we group the type of features into three different groups according to the means of data collection: (1) webcam-based features, (2) Eye Tracker-Captured features, and (3) mouse cursor-based features. The performances of different combinations of modalities were evaluated. We apply machine learning techniques to build up user-independent models for both Language Learning tasks and Web Searching tasks separately. The findings suggest that the multimodal approach outperforms unimodal approaches in our studies. Evaluation results also demonstrate the versatility of our feature set, as it achieves reasonable performances of engagement detection in different tasks.
Rights: All rights reserved
Access: open access

Files in This Item:
File Description SizeFormat 
991022067458403411.pdfFor All Users2.92 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/9304