Scalable continual learning for neural network-based models

Zhao, Yuqing

Author:	Zhao, Yuqing
Title:	Scalable continual learning for neural network-based models
Advisors:	Cao, Jiannong (COMP)
Degree:	Ph.D.
Year:	2025
Subject:	Neural networks (Computer science) Machine learning Artificial intelligence -- Data processing Hong Kong Polytechnic University -- Dissertations
Department:	Department of Computing
Pages:	xxv, 173 pages : color illustrations
Language:	English
Abstract:	Neural networks serve as the powerful foundation of modern AI, achieving remarkable success in areas like vision, language, and robotics due to their ability to learn complex patterns. However, they have a significant drawback: they are static. When confronted with new data in a changing environment, they tend to forget previous knowledge catastrophically. To learn a new task, they must be retrained from the beginning, which is often inefficient and impractical. This is where continual learning comes into play as a promising direction. Continual learning aims to enable models to learn continuously from a stream of data without the need for retraining. But for continual learning to transition from a lab concept to a real-world technology, it must be scalable, which means that a model's performance should not degrade catastrophically as the number of datasets or the duration of learning increases. Scalability can be broken down into three essential requirements: seamless retention of accuracy (being task agnostic), robustness to heterogeneous data, and the ability to adapt sustainably over the long term. Existing continual learning methods have limited scalability as they primarily focus on accuracy preservation, often sacrificing other requirements. Rehearsal and regularization methods help reduce forgetting by storing past data or constraining weight updates. While they are task-agnostic, they struggle with long-term accuracy and data heterogeneity. In contrast, structural methods add components for new tasks and provide high accuracy but require manual task annotations and are resource-intensive, making them unsuitable for long-term use. Overall, existing methods struggle to meet all three requirements, which is the gap this thesis aims to address. This gap arises from three fundamental challenges presented by real-world data streams. Current research primarily focuses on the first challenge: the issue of non-IID (Independent and Identically Distributed) data, which leads to catastrophic forgetting. This problem has not yet been sufficiently resolved to ensure seamless retention of accuracy. The second challenge is that neural networks must manage the heterogeneity of datasets, which may vary in size, similarity, and complexity. Finally, to support sustainable long-term adaptation, neural networks must be able to handle incremental datasets. The goal of my PhD is to comprehensively tackle these challenges and facilitate scalable continual learning for neural network-based models, as summarized below. Addressing the first forgetting challenge of non-IID datasets, I drew inspiration from the neural reuse principle in the brain and introduced a parameter reuse algorithm featuring task-agnostic parameter isolation to overcome catastrophic forgetting. This algorithm involves isolating key neural network parameters by freezing their gradient updates during training. It allows the remaining parameters to adapt to new data while retaining previously acquired knowledge. I also extended this problem to the more complex Continual Federated Learning setting, where data is non-IID not only over time but also across clients. This scenario introduces both temporal forgetting and spatial bias. To address these issues, we developed FedDistill, which employs group distillation to reduce bias for underrepresented classes and incorporates a hybrid local-global architecture to combat forgetting. Furthermore, to tackle heterogeneity in sequential datasets, this thesis introduces a novel adaptive continual learning method. Specifically, it employs fine-grained data-driven pruning to adapt to variations in data complexity and dataset size. It also combines previously proposed task-agnostic parameter isolation to mitigate the impact of varying degrees of catastrophic forgetting caused by differences in data similarity. This study builds several challenging scenarios in which the data domains and classes are incremental and conducts experiments to test the performance of the proposed algorithms. Finally, to enable sustainable long-term adaptability over incremental new data, this thesis analyzes model growth, in which a pre-trained neural network model grows in parameters to cope with limited network capacity. However, improper model growth can lead to severe degradation of previously learned knowledge, an issue this thesis identifies and names as growth-induced forgetting (GIFt). I propose a novel sparse model growth approach that employs data-driven sparse layer expansion and on-data initialization to overcome the issue of GIFt while enhancing adaptability over new data.
Rights:	All rights reserved
Access:	open access

Files in This Item:

File	Description	Size	Format
8608.pdf	For All Users	29.38 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/14154