Author: | Mai, Weixin |
Title: | Self-supervised features for speech emotion recognition |
Degree: | M.Sc. |
Year: | 2023 |
Department: | Department of Electrical and Electronic Engineering |
Pages: | vii, 65 pages : color illustrations |
Language: | English |
Abstract: | Through speech emotion recognition, a computer can better participate the intention of its user and improve the life. Recently, research on self-supervised speech representation has become hot and achieves excellent performance in various downstream tasks. This dissertation is based on the HuBERT self-supervised model for speech emotion recognition. The main work and innovations of this dissertation are as follows. A self-supervised model is used for speech emotion recognition. The method is divided into two parts: (1) Upstream, a pre-trained self-supervised model is trained from a large amount of unlabelled audio data to extract general speech features. (2) Downstream, a small amount of labelled data is then used for fine-tuning the pre-trained model to extract emotion speech features. Based on the HuBERT self-supervised model, the pre-training and fine-tuning approach was evaluated using the IEMOCAP dataset. The experimental results show that the accuracy of the self-supervised model for speech emotion recognition reaches 70%, which is a promising result. The accuracy is higher than the that of many prevalent systems using self-supervised models for speech emotion recognition. The performance of the adaptive model is compared to the baseline model to demonstrate the effectiveness of self-supervised adaptation. The result show that that the method of using the self-supervised model for feature extraction and applying the extracted features to speech emotion classification is feasible and can effectively improve the accuracy of SER. |
Rights: | All rights reserved |
Access: | restricted access |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
8277.pdf | For All Users (off-campus access for PolyU Staff & Students only) | 1.15 MB | Adobe PDF | View/Open |
Copyright Undertaking
As a bona fide Library user, I declare that:
- I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
- I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
- I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.
Please use this identifier to cite or link to this item:
https://theses.lib.polyu.edu.hk/handle/200/13872