Effective and efficient optimization methods for deep learning

Yong, Hongwei

Author:	Yong, Hongwei
Title:	Effective and efficient optimization methods for deep learning
Advisors:	Zhang, Lei (COMP)
Degree:	Ph.D.
Year:	2022
Subject:	Machine learning Hong Kong Polytechnic University -- Dissertations
Department:	Department of Computing
Pages:	xxi, 159 pages : color illustrations
Language:	English
Abstract:	Optimization techniques play an essential role in deep learning, and a favorable optimization approach can greatly boost the final performance of the trained deep neural network (DNN). Generally speaking, there are two major goals for a good DNN optimizer: accelerating the training process and improving the model generalization capability. In this thesis, we study the effective and efficient optimization techniques for deep learning. Batch normalization (BN) is a key technique for stable and effective DNN training. It can simultaneously improve the model training speed and the model generalization performance. However, it is well-known that the training and inference stages of BN have certain inconsistency, and the performance of BN will drop largely when the training batch size is small. In Chapter 2, we prove that BN actually introduces a certain level of noise into the sample mean and variance during the training process. We then propose a Momentum Batch Normalization (MBN) method to control the noise level and improve the training with BN. Meanwhile, in Chapter 3, we put forward an effective inference method of BN, i.e, Batch Statistics Regression (BSR), which uses the instance statistics to predict the batch statistics with a simple linear regression model. BSR can more accurately estimate the batch statistics, making the training and inference of BN much more consistent. We evaluate them on CIFAR100/CIFAR100, Mini-ImageNet, ImageNet, etc. Gradient descent is dominantly used to update DNN models for its simplicity and efficiency to handle large-scale data. In Chapter 4, we present a simple yet effective DNN optimization technique, namely gradient centralization (GC), which operates directly on gradients by centralizing the gradient vectors to have zero mean. GC can be viewed as a projected gradient descent method with a constrained loss function. We show that GC can regularize both the weight space and output feature space so that it can boost the generalization performance of DNNs. In Chapter 5, we present a feature stochastic gradient descent (FSGD) method to approximate the desired feature outputs with one-step gradient descent. FSGD improves the singularity of feature space and thus enhances feature learning efficacy. Finally, in Chapter 6 we propose a novel optimization approach, namely Embedded Feature Whitening (EFW), which overcomes the several drawbacks of conventional feature whitening methods while inheriting their advantages. EFW only adjusts the gradient of weight by using the whitening matrix without changing any part of the network so that it can be easily adopted to optimize pre-trained and well-defined DNN architectures. We testify them on various tasks, including image classification on CIFAR100/CIFAR100, ImageNet, fine-grained image classification datasets, and object detection and instance segmentation on COCO, and them achieve obvious performance gains. In summary, in this thesis, we present five deep learning optimization methods. Among them, MBN and BSR improve the BN training and inference, respectively; GC adjusts the gradient of weight with a centralization operation; FSGD provides a practical approach to perform feature-driven gradient descent; and EFW embeds the existing feature whitening into the optimization algorithms for effective deep learning. Extensive experiments demonstrate their effectiveness and efficiency for DNN optimization.
Rights:	All rights reserved
Access:	open access

Files in This Item:

File	Description	Size	Format
6401.pdf	For All Users	3.21 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/12027