Gradient-wise optimization for distributed machine learning

Wu, Feijie

Author:	Wu, Feijie
Title:	Gradient-wise optimization for distributed machine learning
Advisors:	Guo, Song (COMP)
Degree:	M.Phil.
Year:	2024
Subject:	Machine learning Electronic data processing -- Distributed processing Hong Kong Polytechnic University -- Dissertations
Department:	Department of Computing
Pages:	xiv, 131 pages : color illustrations
Language:	English
Abstract:	Distributed machine learning has intrigued a booming interest and achieved rapid development over the past decades. It allows multiple nodes with different data sources to collaboratively train a model using their local computational resources, which achieves linear speedup with respect to the number of nodes. However, the distributed manner mainly has threefold challenges. First, full-precision synchronizations occupy significant communication bandwidth. In particular, traditional algorithms require global synchronization at every iteration, which consumes considerable communication overhead and leads to a critical slowdown in terms of training time. Second, the computational capabilities vary among nodes, resulting in resource underutilization because all nodes should wait for the slowest one. Third, a conventional assumption on the data distribution is independent and identical among nodes. However, in reality, the data are heterogeneous because there is no intersection between any two clients when data sharing is not permitted. To avoid the overwhelming communication consumption, a common practice is to adopt a gradient compression approach, e.g., one-bit compressed stochastic gradient descent (signSGD). Traditional signSGD has made a great success in a star topology. However, due to cascading compression, it can not be directly employed in multi-hop all-reduce (MAR), a synchronization paradigm that has been widely adopted in network-intensive high-performance computing systems like public clouds. To support signSGD implementation under MAR, we propose a learning synchronization system, Marsit. It prevents cascading compression by employing a bit-wise operation for unbiased sign aggregation and a unique global compensation approach to accommodate the compression deviation. Another solution to reducing the communication overhead is to allow nodes to perform multiple but inconsistent local updates, which simultaneously settle computational heterogeneity. However, the strategy possibly leads to object inconsistency when data heterogeneity exists, which undermines the model performance. Consequently, we design a gradient calibration approach, FedaGrac, which calibrates the local direction to a predictive global orientation. It is guaranteed that the aggregated model does not vary substantially from the global optimum while fully utilizing the local updates of faster nodes by using the estimated orientation. In a nutshell, we utilize the gradient-wise approaches to optimize the training efficiency in distributed machine learning. Theoretical results reveal our gradient compression framework retains the same convergence rate as non-compression mechanisms, while the gradient calibration algorithm holds an improved order of convergence rate than the state-of-the-art approaches. Extensive experiments have demonstrated the superiority of our proposed methods.
Rights:	All rights reserved
Access:	open access

Files in This Item:

File	Description	Size	Format
7402.pdf	For All Users	1.23 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/12968