Effective and efficient optimization methods for deep learning

Yong, Hongwei

Full metadata record

DC Field	Value	Language
dc.contributor	Department of Computing	en_US
dc.contributor.advisor	Zhang, Lei (COMP)	en_US
dc.creator	Yong, Hongwei	-
dc.identifier.uri	https://theses.lib.polyu.edu.hk/handle/200/12027	-
dc.language	English	en_US
dc.publisher	Hong Kong Polytechnic University	en_US
dc.rights	All rights reserved	en_US
dc.title	Effective and efficient optimization methods for deep learning	en_US
dcterms.abstract	Optimization techniques play an essential role in deep learning, and a favorable optimization approach can greatly boost the final performance of the trained deep neural network (DNN). Generally speaking, there are two major goals for a good DNN optimizer: accelerating the training process and improving the model generalization capability. In this thesis, we study the effective and efficient optimization techniques for deep learning.	en_US
dcterms.abstract	Batch normalization (BN) is a key technique for stable and effective DNN training. It can simultaneously improve the model training speed and the model generalization performance. However, it is well-known that the training and inference stages of BN have certain inconsistency, and the performance of BN will drop largely when the training batch size is small. In Chapter 2, we prove that BN actually introduces a certain level of noise into the sample mean and variance during the training process. We then propose a Momentum Batch Normalization (MBN) method to control the noise level and improve the training with BN. Meanwhile, in Chapter 3, we put forward an effective inference method of BN, i.e, Batch Statistics Regression (BSR), which uses the instance statistics to predict the batch statistics with a simple linear regression model. BSR can more accurately estimate the batch statistics, making the training and inference of BN much more consistent. We evaluate them on CIFAR100/CIFAR100, Mini-ImageNet, ImageNet, etc.	en_US
dcterms.abstract	Gradient descent is dominantly used to update DNN models for its simplicity and efficiency to handle large-scale data. In Chapter 4, we present a simple yet effective DNN optimization technique, namely gradient centralization (GC), which operates directly on gradients by centralizing the gradient vectors to have zero mean. GC can be viewed as a projected gradient descent method with a constrained loss function. We show that GC can regularize both the weight space and output feature space so that it can boost the generalization performance of DNNs. In Chapter 5, we present a feature stochastic gradient descent (FSGD) method to approximate the desired feature outputs with one-step gradient descent. FSGD improves the singularity of feature space and thus enhances feature learning efficacy. Finally, in Chapter 6 we propose a novel optimization approach, namely Embedded Feature Whitening (EFW), which overcomes the several drawbacks of conventional feature whitening methods while inheriting their advantages. EFW only adjusts the gradient of weight by using the whitening matrix without changing any part of the network so that it can be easily adopted to optimize pre-trained and well-defined DNN architectures. We testify them on various tasks, including image classification on CIFAR100/CIFAR100, ImageNet, fine-grained image classification datasets, and object detection and instance segmentation on COCO, and them achieve obvious performance gains.	en_US
dcterms.abstract	In summary, in this thesis, we present five deep learning optimization methods. Among them, MBN and BSR improve the BN training and inference, respectively; GC adjusts the gradient of weight with a centralization operation; FSGD provides a practical approach to perform feature-driven gradient descent; and EFW embeds the existing feature whitening into the optimization algorithms for effective deep learning. Extensive experiments demonstrate their effectiveness and efficiency for DNN optimization.	en_US
dcterms.extent	xxi, 159 pages : color illustrations	en_US
dcterms.isPartOf	PolyU Electronic Theses	en_US
dcterms.issued	2022	en_US
dcterms.educationalLevel	Ph.D.	en_US
dcterms.educationalLevel	All Doctorate	en_US
dcterms.LCSH	Machine learning	en_US
dcterms.LCSH	Hong Kong Polytechnic University -- Dissertations	en_US
dcterms.accessRights	open access	en_US

Files in This Item:

File	Description	Size	Format
6401.pdf	For All Users	3.21 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/12027