| Author: | Shi, Guangyuan |
| Title: | Optimizing knowledge transfer in continual and multi-task learning environments |
| Advisors: | Wu, Xiao-ming (COMP) |
| Degree: | Ph.D. |
| Year: | 2025 |
| Subject: | Machine learning Reinforcement learning Computer multitasking Hong Kong Polytechnic University -- Dissertations |
| Department: | Department of Computing |
| Pages: | xxii, 160 pages : color illustrations |
| Language: | English |
| Abstract: | Optimizing knowledge transfer is a key challenge in machine learning, especially in dynamic environments where tasks and data continually evolve. Conventional machine learning methods, generally rely on the premise that the feature space and data distribution remain consistent between the training and testing phases. In reality, this condition is rarely met, as real-world data often exhibits substantial variability. This limitation reduces the usability and effectiveness of models, particularly when training data is insufficient, tasks have diverse distributions, or environments change, necessitating model retraining. In such settings, models must handle multiple tasks simultaneously while managing diverse and potentially conflicting objectives. Moreover, it is essential for models to learn new tasks while retaining existing knowledge and to swiftly adjust to new situations or tasks with minimal retraining effort. This paper examines strategies for optimizing knowledge transfer in continual learning (CL) and multi-task learning (MTL) to improve their performance in practical applications. Continual learning (CL) enables models to learn from a series of tasks while retaining knowledge from earlier tasks, thus avoiding catastrophic forgetting, where new learning negatively impacts previously acquired knowledge. We propose a novel approach that guides models to converge to flat local minima during initial training, requiring minimal adjustments when adapting to new tasks. This strategy reduces the likelihood of forgetting and enhances model robustness in dynamic environments, making it particularly effective for applications requiring continual adaptation to new data and tasks. In multi-task learning (MTL), the challenge is to transfer knowledge across different tasks without causing negative transfer, where learning one task adversely affects performance on others. Negative transfer often arises from conflicting gradients during model updates, where the update direction is dominated by tasks with larger gradient magnitudes, hindering effective learning of other tasks. To mitigate this, we introduce a method that identifies layers with severe gradient conflicts and switches them from shared to task-specific configurations. This approach prevents gradient conflicts in shared layers, ensuring balanced learning and improving overall model performance and generalization across tasks. Additionally, considering the increasing size of pretrained base models and the rising costs associated with knowledge transfer, we introduce a parameter-efficient fine-tuning (PEFT) algorithm. This algorithm aims to optimize the adaptability of large language models (LLMs) by selectively fine-tuning only the most critical layers. By learning binary masks for each low-rank weight matrix used in LoRA--determining whether a layer needs a LoRA adapter, where a mask value of 0 indicates that no LoRA adapter is required and thus no change to the model parameters--our approach significantly reduces memory overhead and computational costs while avoiding overfitting. This makes transfer learning more efficient and feasible in resource-constrained environments. In summary, this thesis explores a set of complementary methods aimed at improving knowledge transfer in machine learning under practical constraints. By addressing critical challenges such as continual adaptation, task interference, and computational efficiency, the proposed approaches contribute to enhancing the robustness and practicality of transfer learning in real-world settings. These methods--focused on reducing forgetting in continual learning, mitigating gradient conflicts in multi-task learning, and improving parameter efficiency in fine-tuning large models--offer targeted solutions to common limitations in dynamic and resource-constrained environments. Experimental results support their effectiveness, showing improvements in model stability, generalization, and adaptability. Overall, this work offers both practical insights and methodological contributions that can inform future research and applications in scalable, efficient machine learning. The results have been published or submitted in various top AI conferences. |
| Rights: | All rights reserved |
| Access: | open access |
Copyright Undertaking
As a bona fide Library user, I declare that:
- I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
- I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
- I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.
Please use this identifier to cite or link to this item:
https://theses.lib.polyu.edu.hk/handle/200/13802

