Author: | Lu, Zexin |
Title: | Machine-aided online user engagements |
Advisors: | Li, Jing (COMP) Li, Qing (COMP) |
Degree: | Ph.D. |
Year: | 2022 |
Subject: | Online social networks Natural language processing (Computer science) Hong Kong Polytechnic University -- Dissertations |
Department: | Department of Computing |
Pages: | xx, 127 pages : color illustrations |
Language: | English |
Abstract: | In view of the widespread use of social platforms, interpersonal communications have come to play an increasingly crucial role in our daily activities. Nevertheless, although every individual is part of our society, many of them have not yet gathered the ability to socialize well with others. For these people, they may unintentionally ruin a conversation or be reluctant to voice their opinions. To help them perform in social interactions better, this thesis proposes novel solutions to employ data-driven natural language processing (NLP) methods to provide support and guidance to users for them to better engage in online social interactions. To that end, we first measure the understanding ability of NLP models on the user-generated content on social media. Specifically, we present the first benchmark to investigate how well the state-of-the-art natural language understanding (NLU) models tackle social media tasks, where the texts usually exhibit the inherent noise (e.g., informal writings) underlying the user-generated contents. To build the benchmark, we gather two large-scale Chinese datasets from Weibo—80K posts with crowd-sourcing annotations and 3K posts with expert annotations for three fundamental tasks (Chinese word segmentation, part-of-speech tagging, and named-entity recognition) to examine how well models gain the generic language understanding. In addition, model performance on popular social media applications, such as rumor detection, emoji prediction, sentiment analysis, and hashtag classification, are examined to investigate NLU models' capability of capturing specific semantics from social media messages. The experimental results demonstrate the effectiveness of trendy language encoders from the BERT family to to understand social media messages, which even obtained better results than human readers. Then, we examine user participants' behavior in conversations via estimating their effects on the residual life for conversations, which is defined as the count of new turns to occur in a conversation thread. While most previous work focuses on the coarse-grained estimation that classifies the number of coming turns into two categories, we study fine-grained categorization for varying lengths of residual life. To this end, we propose a hierarchical neural model that jointly explores indicative representations from the content in turns and the structure of conversations in an end-to-end manner. Extensive experiments on both human-human and human-machine conversations demonstrate the superiority of our proposed model and the potential NLP models to evaluate the engaging degree of user discussions. At last, we research how to actively draw the engagement of users who prefer not to comment with words. A novel task is proposed to generate vote questions for social media posts. It offers an easy way to hear the voice of the public and learn from their feelings about important social topics. While most related work tackles formal languages (e.g., exam papers), we generate vote questions for short and colloquial social media messages exhibiting severe data sparsity. To deal with that, we propose to encode user comments and discover latent topics therein as contexts. They are then incorporated into a sequence-to-sequence (S2S) architecture for question generation and its extension with dual decoders to additionally yield vote answers. For experiments, we collect a large-scale Chinese dataset from Sina Weibo. The results show that our model outperforms the popular S2S models without leveraging topics from comments and the dual decoder design can further benefit the prediction of both questions and answers. |
Rights: | All rights reserved |
Access: | open access |
Copyright Undertaking
As a bona fide Library user, I declare that:
- I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
- I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
- I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.
Please use this identifier to cite or link to this item:
https://theses.lib.polyu.edu.hk/handle/200/11984