Data-driven deep reinforcement learning for decision-making applications

Wang, Jia

Author:	Wang, Jia
Title:	Data-driven deep reinforcement learning for decision-making applications
Advisors:	Cao, Jiannong (COMP)
Degree:	Ph.D.
Year:	2021
Subject:	Reinforcement learning Machine learning Big data Hong Kong Polytechnic University -- Dissertations
Department:	Department of Computing
Pages:	xvi, 100 pages : color illustrations
Language:	English
Abstract:	Decision-making applications have become an important part of today's competitive, knowledge-based society, which benefits many important areas. Significant progress has been made in machine learning recently, thanks to the availability of data not previously available. By learning from past data, machine learning can make better decisions than relying solely on domain knowledge. Among various machine learning algorithms, reinforcement learning (RL) is the most promising algorithm because it learns to map current conditions on decision solutions and considers the impact of current decisions on subsequent decisions. Typically, reinforcement learning learns through trial-and-error using the harvested data based on its own experience to make informed decisions. In many practical applications, since there exists a large amount of offline collected data with rich prior information, the ability to learn from big data becomes the key to reinforcement learning to solve realistic decision-making problems. Unlike traditional RL methods which interact with an online environment, learning the strategy from a fixed dataset is particularly challenging. The reasons are threefold. First, for data which are generated from the daily system operations, they are not independent and identically distributed. By training on partial dataset, an RL agent can learn the converged model which makes it reluctant to explore the remaining data and further improve the model performance. Second, without the proper understanding of underlying data distribution, an RL agent may learn a decision-making strategy that easily over-fits to the observed samples in the training set but fail to generalize well on unseen samples in the testing set. Third, an RL training process can be very unstable, when data are noisy and highly variant. In this thesis, we have studied data-driven reinforcement learning, aiming to derive decision strategies from big data collected offline. The first contribution of this thesis comes from enabling an RL agent to learn strategies from data with repetitive patterns. To force an RL agent to fully "explore" massive data, we partition the historical big dataset into multi-batch datasets. Typically, we study in both theory and practice how an RL agent can incrementally improve the strategy by learning from the multi-batch datasets. The second contribution of this thesis comes from that we explore the underlying data property distribution under the reinforcement learning scheme. With the generative distribution, one can select the hardest (most representative) samples to train the strategy model, thus achieving a better application performance. The third contribution of the thesis comes from we apply the RL method to learn strategies from high variance data. Specifically, we bound the distribution of the parameters in the new strategy relatively close to its predecessor strategy to stabilize the training. Finally, through data-driven reinforcement learning, we thoroughly study various applications, including social analysis, dynamic resource allocation, and multi-agent pattern formation.
Rights:	All rights reserved
Access:	open access

Files in This Item:

File	Description	Size	Format
5553.pdf	For All Users	11.53 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/11095