GAM-NodeFormer : graph-attention multi-modal emotion recognition in conversation with node transformer

Huang, Zilong

Author:	Huang, Zilong
Title:	GAM-NodeFormer : graph-attention multi-modal emotion recognition in conversation with node transformer
Advisors:	Mak, M. W. (EEE)
Degree:	M.Sc.
Year:	2023
Department:	Department of Electrical and Electronic Engineering
Pages:	vi, 42 pages : color illustrations
Language:	English
Abstract:	Emotion Recognition in Conversation (ERC) has great prospects in areas such as human-computer interaction and medical counseling. In dialogue videos, the emotion of a speaker can be expressed through different modalities, including text, speech, and visual. For multimodal ERC, the fusion of different modalities is crucial. Existing multimodal ERC approaches often concatenate multimodal features without considering the differences in the emotion information from individual modalities. In particular, not much attention was spent on balancing the contribution from the dominant and auxiliary modalities, leading to suboptimal multimodality fusion. To address the aforementioned issues, we propose a multimodal network called GAM-NodeFormer for conversational emotion recognition. The network leverages the features at different stages of a transformer encoder and performs feature fusion at multiple stages. Specifically, in the early fusion stage, we introduce a NodeFormer module for multimodal feature fusion. The module uses a Transformer-based fusion mechanism to combine emotion features extracted from the visual, audio, and textual modalities. It also leverages the advantages of the dominant modality and enhances the complementarity between modalities. Afterwards, the fused features are updated by a graph neural network to build a dialogue environment. We design a graph attention module for the late fusion stage to refine the multimodal features before and after the graph network update, thereby improving the final quality of the fused features. To evaluate the proposed model, we conducted extensive experiments on two public benchmark datasets: MELD and IEMOCAP. Results show that the proposed model can achieve a new state-of-the-art performance in ERC, demonstrating the effectiveness and superiority of the model.
Rights:	All rights reserved
Access:	restricted access

Files in This Item:

File	Description	Size	Format
8263.pdf	For All Users (off-campus access for PolyU Staff & Students only)	1.28 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/13910