|Personalized deep reinforcement learning based recommendations
|Li, Wenjie (COMP)
|Recommender systems (Information filtering)
Hong Kong Polytechnic University -- Dissertations
|Department of Computing
|xxii, 160 pages : color illustrations
|Recommender system is a powerful tool for information filtering, which aims to proactively recommend potentially interesting items to users. The majority of existing recommendation algorithms are essentially supervised learning (SL) based approaches, which aim to learn static, passive, and shortsighted predictive models for the single-step recommendation problem. As a result, they are not able to provide satisfactory solutions to the multi-step interactive recommendation problem in more practical scenarios. To address this issue, a promising way is to leverage the reinforcement learning (RL) paradigm to build an RL-based recommendation agent. Different from SL-based systems, the RL-based agent aims to learn a dynamic, proactive, and far-sighted recommendation policy that optimizes the cumulative rewards received in the multi-step interactive recommendation process. In the literature, a number of RL-based recommendation agents, including the early proposed tabular RL agents and the recently proposed deep RL agents, have demonstrated great potential in different recommendation scenarios and datasets. However, there is a fundamental domain-specific problem that has been rarely noticed and investigated in the past. That is how to effectively model personalization and collaboration in an RL-based recommendation agent. Personalization means that the agent should model the personalized characteristics of each user as much as possible, while collaboration implies that the agent should model the collaborative relationships (e.g., behavioral similarities) between different users as much as possible. Both of them are crucial factors to providing high-quality personalized recommendations for the entire user community, and thus to optimizing the overall profit of the recommender system. The objective of my thesis work is to develop truly personalized RL-based recommendation agents by systematically addressing the issues of personalization and collaboration modeling. Without loss of generality, we adopt value-based deep RL as the basis to conduct our research. We observe that the performance of a value-based deep RL agent is affected by three key components: the state/action representation module that transforms raw states/actions to high-level representations, the action-value prediction module that outputs action-values based on high-level state/action representations, and the Q-learning module that determines how to update the Q-network (i.e., the combination of the first two modules) towards the optimal action-value function corresponding to an optimal policy. Based on these observations, and inspired by the advancements in SL-based recommendation, we develop four novel and effective value-based deep RL recommendation agents. Each of the proposed agents is designed based on substantial improvements in one or more of the three modules by incorporating personalization and collaboration in different ways.
The first work presents User-specific Deep Q-network (UDQN), a two-stage pipeline agent that first constructs latent vector representations of states and actions using matrix factorization (MF) and then estimates action-values based on those representations using Q-learning. In the UDQN agent, the MF-based state/action representation module effectively models personalization and collaboration by mapping all users into a shared latent feature space. The second work describes Graph Convolutional Q-network (GCQN), an end-to-end agent that directly estimates action-values based on the input of graph-structured states and actions. The GCQN agent integrates a graph convolutional network (GCN) based state/action representation module, which successfully models personalization and collaboration by aggregating valuable features from target user's local neighborhood in the user-item bipartite graph, in terms of feature propagation on the "user-item-user" paths. The third work introduces Social Attentive Deep Q-network (SADQN), which explicitly integrates personalization and collaboration, on the basis of UDQN, by predicting action-values via the combination of a personal action-value function and a social action-value function. The two functions aim to estimate action-values based on the preferences of individual users and of their social friends in the social network, respectively. In particular, the social action-value function successfully models the collaborations between target user and his/her social friends by leveraging social attention to capture the social influence between them. Finally, the fourth work presents Personalized Deep Q-network (PDQN), which is able to estimate fully personalized action-values based on the user-specific information of target user and the general information of state shared by all users. Unlike the UDQN and GCQN agents that implicitly model personalization and collaboration into the state/action representations, PDQN explicitly models them into a brand-new personalized Q-network architecture that consists of a user-specific action-value function and a general action-value function. On the other hand, unlike the SADQN agent, PDQN does not rely on additional social information, which is more domain-free and can be applied to more recommendation scenarios and datasets. Moreover, collaboration is further modeled by PDQN in terms of a novel collaborative Q-learning module. The extensive and solid experiments on real-world datasets demonstrate that the proposed agents achieve the state-of-the-art performance. More importantly, the ideas, methods, and techniques proposed in this thesis are both insightful and generic, which will promote the advancement of RL-based recommendation, and also bring inspirations to the researchers in other related fields.
|All rights reserved
As a bona fide Library user, I declare that:
- I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
- I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
- I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.
Please use this identifier to cite or link to this item: