Author: Mai, Weixin
Title: Self-supervised features for speech emotion recognition
Degree: M.Sc.
Year: 2023
Department: Department of Electrical and Electronic Engineering
Pages: vii, 65 pages : color illustrations
Language: English
Abstract: Through speech emotion recognition, a computer can better participate the intention of its user and improve the life. Recently, research on self-supervised speech representation has become hot and achieves excellent performance in various downstream tasks. This dissertation is based on the HuBERT self-supervised model for speech emotion recognition. The main work and innovations of this dissertation are as follows.
A self-supervised model is used for speech emotion recognition. The method is divided into two parts: (1) Upstream, a pre-trained self-supervised model is trained from a large amount of unlabelled audio data to extract general speech features. (2) Downstream, a small amount of labelled data is then used for fine-tuning the pre-trained model to extract emotion speech features. Based on the HuBERT self-supervised model, the pre-training and fine-tuning approach was evaluated using the IEMOCAP dataset. The experimental results show that the accuracy of the self-supervised model for speech emotion recognition reaches 70%, which is a promising result. The accuracy is higher than the that of many prevalent systems using self-supervised models for speech emotion recognition. The performance of the adaptive model is compared to the baseline model to demonstrate the effectiveness of self-supervised adaptation. The result show that that the method of using the self-supervised model for feature extraction and applying the extracted features to speech emotion classification is feasible and can effectively improve the accuracy of SER.
Rights: All rights reserved
Access: restricted access

Files in This Item:
File Description SizeFormat 
8277.pdfFor All Users (off-campus access for PolyU Staff & Students only)1.15 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/13872