Full metadata record
DC FieldValueLanguage
dc.contributorDepartment of Electrical and Electronic Engineeringen_US
dc.contributor.advisorMak, M. W. (EEE)en_US
dc.creatorHuang, Zilong-
dc.identifier.urihttps://theses.lib.polyu.edu.hk/handle/200/13910-
dc.languageEnglishen_US
dc.publisherHong Kong Polytechnic Universityen_US
dc.rightsAll rights reserveden_US
dc.titleGAM-NodeFormer : graph-attention multi-modal emotion recognition in conversation with node transformeren_US
dcterms.abstractEmotion Recognition in Conversation (ERC) has great prospects in areas such as human-computer interaction and medical counseling. In dialogue videos, the emotion of a speaker can be expressed through different modalities, including text, speech, and visual. For multimodal ERC, the fusion of different modalities is crucial. Existing multimodal ERC approaches often concatenate multimodal features without considering the differences in the emotion information from individual modalities. In particular, not much attention was spent on balancing the contribution from the dominant and auxiliary modalities, leading to suboptimal multimodality fusion.en_US
dcterms.abstractTo address the aforementioned issues, we propose a multimodal network called GAM-NodeFormer for conversational emotion recognition. The network leverages the features at different stages of a transformer encoder and performs feature fusion at multiple stages. Specifically, in the early fusion stage, we introduce a NodeFormer module for multimodal feature fusion. The module uses a Transformer-based fusion mechanism to combine emotion features extracted from the visual, audio, and textual modalities. It also leverages the advantages of the dominant modality and enhances the complementarity between modalities. Afterwards, the fused features are updated by a graph neural network to build a dialogue environment. We design a graph attention module for the late fusion stage to refine the multimodal features before and after the graph network update, thereby improving the final quality of the fused features.en_US
dcterms.abstractTo evaluate the proposed model, we conducted extensive experiments on two public benchmark datasets: MELD and IEMOCAP. Results show that the proposed model can achieve a new state-of-the-art performance in ERC, demonstrating the effectiveness and superiority of the model.en_US
dcterms.extentvi, 42 pages : color illustrationsen_US
dcterms.isPartOfPolyU Electronic Thesesen_US
dcterms.issued2023en_US
dcterms.educationalLevelM.Sc.en_US
dcterms.educationalLevelAll Masteren_US
dcterms.accessRightsrestricted accessen_US

Files in This Item:
File Description SizeFormat 
8263.pdfFor All Users (off-campus access for PolyU Staff & Students only)1.28 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/13910