| Author: | Wei, Xiang |
| Title: | Resilient power system operation with reinforcement learning |
| Advisors: | Bu, Siqi (EEE) Chan, Ka Wing (EEE) |
| Degree: | Ph.D. |
| Year: | 2025 |
| Department: | Department of Electrical and Electronic Engineering |
| Pages: | xv, 162 pages : color illustrations |
| Language: | English |
| Abstract: | The ongoing evolution toward low-carbon and decentralized power systems, driven by high renewable penetration and widespread integration of inverter-based DERs, has raised significant concerns regarding the security, stability, and resilience of both transmission and distribution systems. These systems are now more frequently exposed to high-impact, low-probability events, such as extreme weather conditions, cyber-attacks, multi-component failures, and real-time operational uncertainties. Traditional model-based optimization approaches, such as security-constrained optimal power flow (SCOPF) and contingency-constrained OPF (CCOPF), while mathematically rigorous, often suffer from scalability limitations, long computation times, and a lack of adaptability to rapidly changing conditions. There is an urgent need for new intelligent decision-making methodologies capable of handling uncertainty, maintaining physical constraint feasibility, and enabling fast response in both centralized and decentralized operation frameworks. This thesis addresses these critical challenges by proposing a deep reinforcement learning (DRL)-based framework for resilient and adaptive power system operation under uncertainty. The thesis spans three interconnected layers of power system control: transmission system scheduling under contingencies, real-time voltage regulation in active distribution networks, and coordinated transmission and distribution (T&D) system load restoration during emergency events. Each layer is investigated through a dedicated contribution, incorporating DRL techniques tailored to the respective operational requirements and system architectures. The first contribution introduces a novel adversarial learning-based approach to solving the CCOPF problem under worst-case N-k contingency conditions. A defender-attacker soft actor-critic (DA-SAC) framework is proposed, in which two non-cooperative agents—representing the system operator and an adversarial uncertainty generator—interact within a reinforcement learning environment. The defender agent learns robust dispatch actions, while the attacker agent identifies the worst-case contingency scenarios in a discrete action space. The proposed algorithm embeds constraint violation information directly into the reward function and employs dual-timescale policy updates to enhance convergence and learning stability. This approach shifts robust power system operation from static, model-based optimization to a dynamic, game-theoretic learning paradigm. The second contribution extends the SCOPF model into a two-stage preventive-corrective control framework incorporating fast-response virtual power plants (VPPs). The model is formulated as a constrained Markov decision process (CMDP) and solved using a Lagrangian-based soft actor-critic (L-SAC) algorithm. Preventive and corrective agents are trained to minimize pre-contingency risk and post-contingency recovery costs while satisfying AC power flow constraints. The state-dependent Lagrange multiplier mechanism enables real-time enforcement of safety constraints without relying on static penalty parameters. The inclusion of VPPs in the operational framework enhances flexibility and responsiveness, allowing for dynamic adjustment to unexpected load and generation fluctuations. The third contribution focuses on voltage regulation in active distribution networks (ADNs), where high penetration of inverter-based DERs results in frequent and unpredictable voltage violations. A hierarchical multi-mode voltage control strategy is proposed, featuring day-ahead dispatch of on-load tap changers (OLTCs) and capacitor banks via single-agent RL, and real-time inverter-based control using a multi-agent SAC (MASAC) algorithm with an embedded attention mechanism. The attention module enables each agent to prioritize relevant local observations, ensuring stable policy learning even in large-scale, multi-agent environments. Additionally, the voltage regulation problem is decomposed into three dynamic operational modes—power loss minimization, undervoltage mitigation, and over-voltage correction—allowing the system to flexibly respond to varying operational conditions. The fourth contribution addresses the real-time coordination of load restoration across transmission and distribution systems under N-k emergency conditions. A distributed DRL architecture is proposed, comprising a centralized SAC controller for the transmission system and a complementary attention-enhanced MASAC controller for the distribution system. A VPP is introduced as an aggregator to coordinate distributed DERs and reduce communication burdens. The hierarchical architecture enables asynchronous but coherent interaction between system layers, ensuring scalable and rapid recovery under contingency conditions. The integration of an attention mechanism improves inter-agent coordination and decision accuracy during system-wide restoration efforts. Collectively, the four contributions of this thesis form a comprehensive and integrated framework for enhancing the resilience, adaptability, and operational efficiency of modern power systems under contingencies and uncertainties. By systematically addressing three critical aspects—transmission dispatch against worst-case contingencies, dynamic voltage regulation in active distribution networks, and real-time coordinated restoration across transmission and distribution systems—this work bridges the gap between traditional model-based optimization techniques and data-driven, learning-based control approaches. The proposed reinforcement learning strategies are specifically tailored to overcome key challenges such as computational delays, model inaccuracies, and coordination inefficiencies, which have historically limited the practical deployment of robust control frameworks in real-world systems. Furthermore, by incorporating multi-agent models, adversarial training mechanisms, and hierarchical decision-making structures, the thesis lays the foundation for autonomous, decentralized, and scalable control methodologies that can adapt to evolving system configurations and unforeseen operational scenarios. Extensive case studies on IEEE 30-bus, 118-bus, and modified distribution systems validate the effectiveness and generalizability of the methods, laying a strong foundation for the next generation of learning-augmented decision support systems in modern power networks. Ultimately, this thesis contributes toward realizing the vision of resilient, sustainable, and smart grids capable of ensuring security, stability, and flexibility under the transformative pressures of high renewable integration, decentralization, and digitalization. |
| Rights: | All rights reserved |
| Access: | open access |
Copyright Undertaking
As a bona fide Library user, I declare that:
- I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
- I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
- I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.
Please use this identifier to cite or link to this item:
https://theses.lib.polyu.edu.hk/handle/200/14104

