Applications of reinforcement learning on deregulated active distribution networks

Zhu, Ziqing

Author:	Zhu, Ziqing
Title:	Applications of reinforcement learning on deregulated active distribution networks
Advisors:	Chan, K. W. Kevin (EE)
Degree:	Ph.D.
Year:	2023
Subject:	Electric power distribution Electric power production Hong Kong Polytechnic University -- Dissertations
Department:	Department of Electrical Engineering
Pages:	xvi, 177 pages : color illustrations
Language:	English
Abstract:	The deregulation of the power industry has led to the transformation of the modern power system from a vertically integrated regime to a paradigm incorporating the electricity market and multiple stakeholders. Such a transformation has effectively promoted the maximization of social welfare via fair and competitive market transactions. Another emerging trend of modern power systems is the rapid growth of distributed energy resources (DERs), especially the renewable distributed generations (RDGs) introduced to reduce carbon emissions. However, the large penetration of RDGs in the distribution network with dispersed geographical distribution and uncertain power output has brought huge challenges to the secure and economic operation of the network. In addition, these DERs are invested and managed by autonomous stakeholders and, therefore, cannot be directly controlled by the distribution system operator (DSO). Nevertheless, DERs could be effectively exploited to provide flexible energy procurement for both local load demand and the connected transmission grid if they are financially incentivized via formal market transactions by establishing a distribution level electricity market (DEM), due to their payoff-oriented properties. The advantages of developing the DEM include, but not limited to, efficient energy management and procurement, and the exploitation of other localized flexible resources like the demand-side participants. Incentivized by these benefits, there has been an increasing interest in investigating the DEM market design and implementations. In most recent studies, some typical and emerging features of DEM paradigm include 1) the participation of distributed virtual alliances (DVAs), including virtual power plants (VPPs) and virtual microgrids (VMGs), and 2) the coordination of energy market (EM), ancillary service market (ASM) and carbon emission auction market (CEAM). These features would form the general background contextualizing the research presented in this thesis. This research aims to investigate two key issues arising from the DEM operation, including the optimal bidding strategy of DVAs, and the optimal dispatching strategy of the DSO. For DVAs as market participants competing with each other for successful bids, they need to consider other rival DVAs' decisions and estimate the Nash Equilibrium (NE) as their optimal bidding strategies. For DSO as the market operator, on the one hand, it needs to detect any potential behaviors that undermine fair market transactions, such as arbitrage and abuse of market power, by simulation of all the possible scenarios of NE before implementing any new market rules; on the other hand, the impact of dispatching strategies on both the economic efficiency and the DVAs' bidding strategy preference has to be considered to maximize the total social welfare based on the principle of incentive compatibility. Meanwhile, for the optimization of bidding and dispatching strategy, the promising Reinforcement Learning (RL) has demonstrated better computational performance than conventional optimization methods. It would potentially provide an effective solution to address the bidding and dispatching problems as dynamic programming (DP) problems. In this thesis, a comprehensive review of existing applications of RL techniques in deregulated power systems, covering both the optimal bidding and dispatching problems, will first be presented in the aspects of Markov Decision Process (MDP) model formulation, applicable and feasibility of various RL algorithms. In addition, the applicability and potential obstacles of RL deployment in real-world implementations are discussed, mainly in the fairness of electricity market transactions, the economic efficiency of electricity market operation, and the security of power network operation. The identified issues are tackled in this research as follows. An interactive bidding and dispatching model of VMGs and DSO is then proposed, in which the downstream VMGs perform self-dispatching while trading both energy and ancillary service procurement to the DSO in the day-ahead (DA) DEM. The bi-level bidding and market clearing model is modelled as a Markov Decision Process (MDP) with the solution of Win-or-Learn-Fast Policy Hill-Climbing (WoLF-PHC) algorithm, which is an online and fully-distributed training, enabling VMGs to dynamically update their bidding strategies based on previous market clearing results. VMGs would thereafter conduct the real-time (RT) economic dispatching considering the conditional value-at-risk (CVaR) of penalties caused by the curtailment of renewables, load loss, and failure of providing energy or ancillary service to DSO. Finally, the evolutionary game theory (EGT) with replication dynamic equations (RDEs) is adopted to analyze the inherent dynamics of the proposed MARL driven by WoLF-PHC, revealing the relation between VMGs' bidding strategy convergence and the trading paradigm. The aforementioned VMG-DSO bi-level interaction model is then extended to an environment of joint electricity market (ECM, including EM and ASM) and CEAM, incorporating multiple DVAs, including controllable distributed generators (CDGs), VPPs, and VMGs. This model allows DVAs to modify their bidding strategies in this joint market considering the uncertainty of RDG output, which may result in penalties of DVAs because of the deviation between DA allocation and RT procurement. Also, a new Meta-Learning based Win-or-Learn-Fast (MLWoLF-PHC) algorithm, which not only enables the fully distributed bidding strategy modification, but also performs well considering uncertainty as a risk-averse method, is proposed to solve this model. The computational performance of the proposed algorithm, the equilibrium analysis in the joint market, and the impact of CEAM on the converged market clearing price of ECM are investigated in the case studies. The aforementioned model is further modified to enable both the DSO and DVAs to consider the risk of deviations of energy schedule between DA and RT, which may result in financial losses for balancing service. Furthermore, the concept of Robust Nash Equilibrium (RNE) is introduced, which refers to a set of strategies that can obtain the maximum profit (i.e., be "robust") in the worst-case scenario. This conservative decision making strategy is preferred by market participants for risk mitigation at the early stage of market operation without previous experience. The RNE of DVAs-DSO interaction model is solved by a new Distributed Robust Multi-Agent Deep Deterministic Policy Gradient algorithm (DRMA-DDPG), which is a fully distributed online optimization that would converge to RNE, i.e., the conservative bidding strategies of DVAs and dispatching decisions of DSO. Its high computational performance is demonstrated in the case studies, and the strategic decisions of DVAs and DSO are thoroughly analyzed. Finally, some potential future perspectives of this research are outlined, including: 1) incorporation of decentralized DEM operation paradigm supported by distributed computing architectures like edge computing (EC) and federated learning (FL); and 2) application of emerging techniques like the Inversed Reinforcement Learning (IRL). These promising topics can be considered significant extensions to be further investigated to facilitate future DEM operations.
Rights:	All rights reserved
Access:	open access

Files in This Item:

File	Description	Size	Format
6776.pdf	For All Users	9.91 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/12329