Towards sample bias-aware deep neural network compression

Wang, Yingchun

Author:	Wang, Yingchun
Title:	Towards sample bias-aware deep neural network compression
Advisors:	Guo, Song (COMP) Xiao, Bin (COMP)
Degree:	Ph.D.
Year:	2024
Subject:	Neural networks (Computer science) Deep learning (Machine learning) Data compression (Computer science) Hong Kong Polytechnic University -- Dissertations
Department:	Department of Computing
Pages:	xvii, 165 pages : color illustrations
Language:	English
Abstract:	Deep neural networks (DNNs) have achieved remarkable success in various fields like image classification, natural language processing, and speech synthesis. Their success often relies on a large number of parameters well organized to perform complex computations, introducing significant resource overhead. Over the past decade, research on DNN compression has proliferated, focusing on efficient architecture representations while overlooking the impact of inter-sample variations. In fact, model redundancy is highly sample-dependent, influenced by factors such as object type, environmental context, and data quality. Therefore, this thesis systematically explores sample-oriented deep model compression methods, aiming to improve the performance of lightweight models in complex, real-world data environments. First, to address the channel mis-deletion caused by sample object bias and its resulting channel importance variations, we propose a global channel attention-based model pruning method, named GlobalPru, to improve the performance of statically pruned models. The overall pipeline can be divided into two stages: First, GlobalPru identifies a global channel ranking by a majority-voting-based strategy. Then, during sparse training, it pushes all sample-wise (local) channel attention to the global one via the learn-to-rank regularization. Finally, GlobalPru can execute a fit-to-all (samples) pruning statically since all samples share the same ranking of channel relative importance. Extensive experiments demonstrate the effectiveness of the proposed method. For example, when benchmarked on ImageNet, GlobalPru can reduce 54.0% FLOPs of ResNet-50 with only 0.2% top-1 accuracy degradation, showing better performance in terms of both accuracy and computational cost compared to the state-of-the-art methods. Second, to address the erroneous channel sparsity caused by sample environment (domain) bias and the resulting differences in pruning demands, we propose a novel spurious features-targeted model pruning method, named SFP, to achieve feature-targeted model pruning through a two-stage pipeline. During each iteration, SFP first identifies in-domain samples entangled with spurious features using a theoretical loss threshold. Then, it weakens the feature projection of these identified samples in the model space via a regularization term, thereby sparsifying branches that fit spurious features and effectively aligning the pruned model with invariant feature directions. Extensive experiments demonstrate the effectiveness of the proposed method. For instance, when benchmarked on DomainBed, SFP outperforms the state-of-the-art pruning method for out-of-distribution generalization, with significant accuracy improvements of 2.9%, 9.4%, and 3.2% on VLCS, PACS, and OﬀiceHome, respectively. Third, to address the vulnerable quantized models caused by sample quality bias and its resulting quantization sensitivity differences, we propose a data quality-aware mixed-precision quantization method, named DQMQ, to allocate layer-wise bit-width conditioned on input samples dynamically. We first demonstrate that the optimal bit-width configurations are highly sample quality-dependent. Then, DQMQ is modeled as a hybrid reinforcement learning task that combines policy optimization for bit-width decision-making with supervised quantization training. By relaxing the discrete bit-width sampling to a continuous probability distribution encoded by a few learnable parameters, DQMQ is differentiable and can be optimized end-to-end with a hybrid objective of task accuracy and quantitative benefit. Extensive experiments demonstrate the effectiveness of the proposed method. For instance, when benchmarked on ImageNet, DQMQ improves accuracy by 2.1% for ResNet-18, 1.7% for ResNet-50, and 0.5% for MobileNet-V2, compared to state-of-the-art methods under the same compression rate. In summary, this thesis presents an in-depth research of sample bias-aware deep neural network pruning and quantization, mainly focusing on the challenges arising from sample object bias, sample environment bias, and sample quality bias. The research provides theoretical and technical support for model compression in complex and dynamic real-world data environments. Extensive experiments demonstrate the effectiveness of our proposed methods.
Rights:	All rights reserved
Access:	open access

Files in This Item:

File	Description	Size	Format
7973.pdf	For All Users	8.78 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/13522