Transformer-based textual out-of-distribution detection : methods and analysis

Zhan, Liming

Author:	Zhan, Liming
Title:	Transformer-based textual out-of-distribution detection : methods and analysis
Advisors:	Wu, Xiao-ming (COMP)
Degree:	Ph.D.
Year:	2023
Subject:	Machine learning Text processing (Computer science) Natural language processing (Computer science) Hong Kong Polytechnic University -- Dissertations
Department:	Department of Computing
Pages:	xvii, 121 pages : color illustrations
Language:	English
Abstract:	The success of machine learning methods heavily relies on the assumption that the test data follows a similar distribution to the training data. However, this assumption is frequently violated in real-world scenarios. Detecting distribution shifts between training and inference, referred to as out-of-distribution (OOD) detection, is crucial to prevent models from making unreliable predictions. OOD detection is particularly significant in ensuring the safe use of deep neural networks. Despite its importance and the surge of research in the vision domain, this problem is often overlooked in natural language processing (NLP). This thesis aims to address this gap by proposing and evaluating novel transformer-based OOD detection approaches for various NLP classification tasks, such as dialogue intent detection, topic classification, sentiment classification, and question classification. First, we present an efficient end-to-end learning framework to reduce the complexity of training textual OOD detectors. Since the distribution of OOD samples is arbitrary and unknown in the training stage, previous methods commonly rely on strong assumptions on data distribution such as mixture of Gaussians to make inference, resulting in either complex multi-step training procedures or hand-crafted rules such as confidence threshold selection for OOD detection. To develop a simplified learning paradigm for textual OOD detection, we propose to train a (K+1)-way discriminative classifier by simulating the test scenario during training. Specifically, we construct a set of pseudo OOD samples in the training stage, by generating synthetic OOD samples using in-distribution (ID) features via self-supervision and sampling OOD sentences from easily available open-domain datasets. The pseudo outliers are used to train a discriminative classifier that can be directly applied to and generalize well on the test task. Second, we address the challenge of low-resource settings for textual OOD detection, a critical problem often encountered in the development of machine learning systems. Despite its significance, this problem has received limited attention in the literature and remains largely unexplored. We conduct a thorough investigation of this problem and identify key research issues. Through our pilot study, we uncover why existing textual OOD detection methods fall short in addressing this issue. Building on these findings, we propose a promising solution that leverages latent representation generation and self-supervision. Finally, we delve into Transformer-based representation learning for textual OOD detection. Existing methods commonly adopt the discriminative training objective – maximizing the conditional likelihood p(y\|x) – which is biased and leads to suboptimal OOD detection performance. To address this issue, we propose a generative training framework based on variational inference, which directly optimizes the likelihood of the joint distribution p(x, y). Specifically, our framework takes into account the unique characteristics of textual data and leverages the representations of pre-trained Transformers in an efficient manner. In summary, this thesis provides novel and effective Transformer-based approaches to address the challenges of textual OOD detection. Our proposed methods show significant improvements over existing state-of-the-art methods, and our findings can have practical applications in improving the robustness of machine learning models in NLP.
Rights:	All rights reserved
Access:	open access

Files in This Item:

File	Description	Size	Format
6893.pdf	For All Users	1.93 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/12445