Few-shot intent detection with pre-trained language models: transferability, expressiveness and efficiency

Zhang, Haode

Author:	Zhang, Haode
Title:	Few-shot intent detection with pre-trained language models: transferability, expressiveness and efficiency
Advisors:	Wu, Xiao-ming (COMP)
Degree:	Ph.D.
Year:	2024
Subject:	Dialogue analysis Machine learning Natural language processing (Computer science) Hong Kong Polytechnic University -- Dissertations
Department:	Department of Computing
Pages:	xiv, 95 pages : color illustrations
Language:	English
Abstract:	The identification of user intents is a fundamental component of a task-oriented dialogue system, with the aim of detecting the intent underlying a user’s utterance, according to which an appropriate response is provided. Typically, intent detection is formulated into a text classification task, which has benefited from the success of deep learning techniques. However, the acquisition of a large number of annotations for training is expensive. This thesis addresses the challenge of few-shot intent detection, whereby the goal is to develop a highly effective intent classifier using only a limited amount of annotated data, thereby improving data efficiency. We first study the cross-domain transferability for few-shot intent detection, exploring the possibility of jointly utilizing abundant labeled data in a source domain and easily available unlabeled data in a target domain to train an intent classifier with reasonable performance. We investigate techniques of transfer learning across domains and adapting to a new domain. Leveraging the data in public intent detection datasets, we train IntentBERT, the backbone that transfers knowledge from diverse multiple intent detection domains, significantly improving the performance in the target domain. With easily available unlabeled data in the target domain, the performance is further enhanced. Next, to improve the expressiveness of IntentBERT, the study focuses on a particular property of the pre-trained language models (PLMs) – anisotropy, an undesirable geometric property of the feature space. We discover that supervised pre-training yields an anisotropic feature space, which may suppress the expressive power of the semantic representations. To mitigate the problem, we propose to enhance supervised pre-training by regularizing the feature space towards isotropy. We propose two regularizers based on contrastive learning and correlation matrix respectively, and demonstrate their effectiveness through extensive experiments. Through the joint supervised pre-training and isotropization, we achieve improved performance in few-shot intent detection. Then, to further improve the data efficiency, we revisit the overfitting phenomenon, continual pre-training, and direct fine-tuning based on PLMs in the context of few-shot intent detection. Although the prevailing approach to few-shot intent detection is continual pre-training, i.e., fine-tuning PLMs on external resources, our study demonstrates that continual pre-training may not be necessary. Specifically, we find that the overfitting issue of PLMs may not be as severe as previously believed, i.e. directly fine-tuning PLMs with only a handful of labeled examples already yields decent results, and the performance gap quickly shrinks as the number of labeled data grows. We further enhance the performance of direct fine-tuning with context augmentation and sequential self-distillation. Comprehensive experiments on real-world benchmarks show that given only two or more labeled samples per class, the enhanced direct fine-tuning outperforms many strong baselines that utilize external data sources for continual pre-training. Finally, to enhance the computational efficiency, we study model compression for intent detection with limited labeled data. Traditional approaches to model compression, such as model pruning and distillation, typically rely on access to large amounts of data. However, such datasets are not readily available under the few-shot scenario. To overcome this challenge, we propose a scheme that capitalizes on of-the-shelf generative PLMs for data augmentation. Furthermore, we introduce a vocabulary pruning technique employing a nearest neighbour matching scheme. Through extensive experiments, we demonstrate the efficacy of the proposed method – we can compress the model by a factor of 21, and thus enable the deployment of the model in resource-constrained scenarios, including mobile devices and embedded systems.
Rights:	All rights reserved
Access:	open access

Files in This Item:

File	Description	Size	Format
7378.pdf	For All Users	3.34 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/12927