Author: | Zhang, Yuji |
Title: | Forecast the future : dynamic natural language understanding in evolving social media environments |
Advisors: | Li, Jing (COMP) Li, Wenjie Maggie (COMP) |
Degree: | Ph.D. |
Year: | 2024 |
Subject: | Natural language processing (Computer science) Social media Hong Kong Polytechnic University -- Dissertations |
Department: | Department of Computing |
Pages: | xvi, 133 pages : color illustrations |
Language: | English |
Abstract: | In the digital era, the rapid evolution of user behavior and language use within social media presents significant challenges and opportunities for natural language understanding and recommendation systems. This thesis addresses these challenges by exploring temporal adaptation mechanisms that enhance the predictive capability of models in noisy, ever-changing social media environments. The first aspect of the study focuses on predicting future user preferences amidst the dynamic and noisy environment of social media. By characterizing users’ hashtagging behavior, this work leverages a deep semantic space built with a pre-trained BERT model and a neural topic model through multitask learning. This approach integrates user history and hashtag contexts, surpassing existing methods by customizing user interests to align with evolving hashtag semantics via a novel personalized topic attention mechanism, as demonstrated through extensive experiments on a large-scale Twitter dataset. Building on the understanding of dynamic user preferences, the second aspect examines a more fundamental problem of text classification and addresses the deteriorating performance of models over time due to the evolving nature of language features in social media. Recognizing the limitations of static training setups, we empirically study social media NLU in a dynamic setup, where models are trained on past data and tested on future data. This setup better reflects realistic practice and allows for the evaluation of models’ adaptability to dynamic environments. We further explore leveraging unlabeled data created after a model is trained, examining the performance of unsupervised domain adaptation baselines based on auto-encoding (for topic modeling of past and future data) and pseudo-labeling (for classification). Our experiments on four social media tasks reveal that while evolving environments universally challenge classification accuracy, combining auto-encoding and pseudo-labeling shows the best robustness in dynamic settings. After verifying the effectiveness of topic modeling in evolving and noisy social media environments, we introduce a more advanced method based on evolving topics. This third aspect of the research introduces VIBE: Variational Information Bottleneck for Evolutions, a novel model designed to explore topic evolution using Information Bottleneck regularizers and multi-task training. VIBE effectively distinguishes past and future topics, utilizing a small amount of unsupervised, time-adaptive data to dynamically recalibrate and maintain high performance on future data. Substantial experiments on Twitter validate VIBE’s ability to address the temporal degradation in model performance. Finally, the last aspect of this thesis investigates the dynamic interplay between user interests and online content semantics to improve personalized recommendations in evolving environments. Employing a reinforcement learning approach, we dynamically align users’ personal interests with evolving hashtag semantics. This ensures that recommendation models adapt to shifting trends and preferences, providing superior recommendations. Experiments conducted on datasets from Weibo and Twitter highlight the model’s capability to enhance user experience in the rapidly changing landscape of virtual communications. Together, these studies form a comprehensive framework for dynamic natural language understanding in evolving social media environments. This thesis provides effective solutions for enabling the relevance and adaptability of models in the face of continuous linguistic and behavioral evolution, consequently benefiting dynamic user preference modeling and recommendations. By addressing the challenges of temporal adaptation, this research advances the field and contributes to the development of more responsive and effective social media analysis tools. |
Rights: | All rights reserved |
Access: | open access |
Copyright Undertaking
As a bona fide Library user, I declare that:
- I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
- I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
- I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.
Please use this identifier to cite or link to this item:
https://theses.lib.polyu.edu.hk/handle/200/13246