Gradient-wise optimization for distributed machine learning

Wu, Feijie

Full metadata record

DC Field	Value	Language
dc.contributor	Department of Computing	en_US
dc.contributor.advisor	Guo, Song (COMP)	en_US
dc.creator	Wu, Feijie	-
dc.identifier.uri	https://theses.lib.polyu.edu.hk/handle/200/12968	-
dc.language	English	en_US
dc.publisher	Hong Kong Polytechnic University	en_US
dc.rights	All rights reserved	en_US
dc.title	Gradient-wise optimization for distributed machine learning	en_US
dcterms.abstract	Distributed machine learning has intrigued a booming interest and achieved rapid development over the past decades. It allows multiple nodes with different data sources to collaboratively train a model using their local computational resources, which achieves linear speedup with respect to the number of nodes. However, the distributed manner mainly has threefold challenges. First, full-precision synchronizations occupy significant communication bandwidth. In particular, traditional algorithms require global synchronization at every iteration, which consumes considerable communication overhead and leads to a critical slowdown in terms of training time. Second, the computational capabilities vary among nodes, resulting in resource underutilization because all nodes should wait for the slowest one. Third, a conventional assumption on the data distribution is independent and identical among nodes. However, in reality, the data are heterogeneous because there is no intersection between any two clients when data sharing is not permitted.	en_US
dcterms.abstract	To avoid the overwhelming communication consumption, a common practice is to adopt a gradient compression approach, e.g., one-bit compressed stochastic gradient descent (signSGD). Traditional signSGD has made a great success in a star topology. However, due to cascading compression, it can not be directly employed in multi-hop all-reduce (MAR), a synchronization paradigm that has been widely adopted in network-intensive high-performance computing systems like public clouds. To support signSGD implementation under MAR, we propose a learning synchronization system, Marsit. It prevents cascading compression by employing a bit-wise operation for unbiased sign aggregation and a unique global compensation approach to accommodate the compression deviation.	en_US
dcterms.abstract	Another solution to reducing the communication overhead is to allow nodes to perform multiple but inconsistent local updates, which simultaneously settle computational heterogeneity. However, the strategy possibly leads to object inconsistency when data heterogeneity exists, which undermines the model performance. Consequently, we design a gradient calibration approach, FedaGrac, which calibrates the local direction to a predictive global orientation. It is guaranteed that the aggregated model does not vary substantially from the global optimum while fully utilizing the local updates of faster nodes by using the estimated orientation.	en_US
dcterms.abstract	In a nutshell, we utilize the gradient-wise approaches to optimize the training efficiency in distributed machine learning. Theoretical results reveal our gradient compression framework retains the same convergence rate as non-compression mechanisms, while the gradient calibration algorithm holds an improved order of convergence rate than the state-of-the-art approaches. Extensive experiments have demonstrated the superiority of our proposed methods.	en_US
dcterms.extent	xiv, 131 pages : color illustrations	en_US
dcterms.isPartOf	PolyU Electronic Theses	en_US
dcterms.issued	2024	en_US
dcterms.educationalLevel	M.Phil.	en_US
dcterms.educationalLevel	All Master	en_US
dcterms.LCSH	Machine learning	en_US
dcterms.LCSH	Electronic data processing -- Distributed processing	en_US
dcterms.LCSH	Hong Kong Polytechnic University -- Dissertations	en_US
dcterms.accessRights	open access	en_US

Files in This Item:

File	Description	Size	Format
7402.pdf	For All Users	1.23 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/12968