Social trustworthy : trustily-aligned social interaction assistant

Yu, Erxin

Full metadata record

DC Field	Value	Language
dc.contributor	Department of Computing	en_US
dc.contributor.advisor	Li, Jing (COMP)	en_US
dc.contributor.advisor	Li, Wenjie Maggie (COMP)	en_US
dc.creator	Yu, Erxin	-
dc.identifier.uri	https://theses.lib.polyu.edu.hk/handle/200/14077	-
dc.language	English	en_US
dc.publisher	Hong Kong Polytechnic University	en_US
dc.rights	All rights reserved	en_US
dc.title	Social trustworthy : trustily-aligned social interaction assistant	en_US
dcterms.abstract	With the rapid growth of social media platforms like Weibo, Twitter, and Rednote, these channels have become essential for accessing information, expressing opinions, and sharing daily life. However, the overwhelming volume of daily content creates intense competition for users' attention, making it challenging for creators to stand out. To address this, we developed a social interaction assistant to help users craft high-quality posts, generate creative comments, and manage interactions efficiently. The assistant produces personalized content that enhances user engagement and visibility by analyzing social media trends and audience interests. To further enhance the trustworthiness of the social interaction assistant, we conducted safety assessments to analyze the generation process of harmful, biased, or unethical content, aiming to prevent such outputs and enhance the safety of the social interaction assistant. Additionally, we equipped the model with self-correction capabilities, enabling it to better adapt to the dynamically changing social media environment. This enhancement improves the model's generalization ability, allowing it to go beyond the limitations of its training data. As a result, the model is continuously optimized to ensure the generation of higher-quality and safer social media content.	en_US
dcterms.abstract	To this end, we first study trendy response prediction to generate top-liked user replies to social media events automatically. We propose Popularity-Aligned Language Models (PopALM), which leverage reinforcement learning to distinguish responses that are more likely to be favored by a larger audience. Given the inherent noise in user "likes" as labels, we design a curriculum learning strategy within proximal policy optimization (PPO) to guide the model through an easy-to-hard training process, enabling it to focus on essential samples. We construct a large-scale Weibo dataset specifically for trendy response prediction. Experimental results show that PopALM significantly improves the performance of advanced language models, enabling the development of more effective and impactful social interaction assistants.	en_US
dcterms.abstract	The second aspect of this thesis is generating popular quote tweets to enhance public engagement. This task aims to create quote tweets that achieve higher popularity, as reflected by increased likes, replies, and retweets. While large language models (LLMs) excel in language generation, limited research has explored how these models can effectively learn and predict text popularity to better engage audiences. To address this gap, we propose a novel approach called Response-augmented Popularity-Aligned Language Model (RePALM). RePALM aligns language generation with popularity by leveraging augmented auto-responses from readers to provide deeper insights into public preferences. Using the Proximal Policy Optimization framework with a dual-reward mechanism, we jointly optimize for both the popularity of the generated quote tweets and their consistency with reader-provided auto-responses. To evaluate this approach, we construct two datasets: one consisting of quote tweets containing external links and another referencing others' tweets. Experimental results demonstrate that RePALM outperforms advanced language models that do not incorporate response augmentation, highlighting its effectiveness in driving public engagement through popular content generation.	en_US
dcterms.abstract	A trustworthy social interaction assistant must provide both high-quality and safe content. To this end, we examined the safety of LLMs in the context of multi-turn dialogue coreference. Specifically, we created a dataset comprising 1,400 questions across 14 categories, each designed to feature multi-turn coreference safety attacks. Through detailed evaluations of five widely used open-source LLMs, we observed a significant drop in safety performance under these multi-turn coreference safety attacks. To address this safety issue, we propose leveraging system prompts and Chain-of-Thought methods to enhance the safety of LLMs.	en_US
dcterms.abstract	The final aspect focuses on enhancing the self-correction capabilities of models, enabling them to better adapt to the dynamic nature of social media environments. Existing methods are limited by their reliance on training data, constraining their generalization ability and making it difficult for models to handle the ever-changing demands of social media content. To overcome these limitations, we propose Self-Error-Instruct (SEI), a framework that identifies error patterns and synthesizes more generalized training data. Using datasets like GSM8K and MATH, we analyze bad cases and cluster error types and generate targeted training data through a self-instruct approach. This data is further refined and used to fine-tune models, allowing them to break free from the constraints of traditional training data. As a result, models achieve significant improvements in their reasoning capabilities. Experiments on LLaMA3-8B-Instruct and Qwen2.5-Math-7B-Instruct demonstrate significant improvements in in- and out-of-domain performance, showcasing the effectiveness of SEI in enhancing self-correction capabilities.	en_US
dcterms.extent	xxii, 125 pages : color illustrations	en_US
dcterms.isPartOf	PolyU Electronic Theses	en_US
dcterms.issued	2025	en_US
dcterms.educationalLevel	Ph.D.	en_US
dcterms.educationalLevel	All Doctorate	en_US
dcterms.accessRights	open access	en_US

Files in This Item:

File	Description	Size	Format
8530.pdf	For All Users	4.57 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/14077