Author: Wang, Yue
Title: An augmented Lagrangian method for training recurrent neural networks with sample average approximation
Advisors: Chen, Xiaojun (AMA)
Wong, Heung (AMA)
Guo, Xin (AMA)
Degree: Ph.D.
Year: 2025
Department: Department of Applied Mathematics
Pages: xxii, 104 pages : color illustrations
Language: English
Abstract: Recurrent neural networks (RNNs) are widely used to model sequential data in a wide range of areas, such as natural language processing, speech recognition, machine translation, and time series forecasting. The training process of RNNs with nonsmooth activation functions is formulated as an unconstrained optimization problem. The objective function of the problem is nonconvex, nonsmooth and has a highly composite structure, which poses significant challenges. State-of-the-art optimization methods, such as gradient descent-based methods (GDs) and stochastic gradient descent-based methods (SGDs), often lack a well-defined generalized gradient of the nonsmooth objective function and do not provide rigorous convergence analysis. In the thesis, we propose an augmented Lagrangian method (ALM) to solve the nonconvex, nonsmooth and highly composite optimization problem and provide a rigorous convergence analysis. Moreover, the aforementioned unconstrained problem arising from the RNN training process is typically a sample average approximation (SAA) of the original optimization problem whose objective function is formulated with an expectation. Therefore, it is necessary to prove that any accumulation point of minimizers and stationary points of the SAA problems is almost surely a minimizer and a stationary point of the original problem, respectively. The thesis is primarily divided into two parts.
In the first part of the thesis, we focus on the method to solve the nonconvex, nonsmooth and highly composite optimization problem. Specifically, we first reformulate the aforementioned unconstrained optimization problem equivalently as a constrained optimization problem with a simple smooth objective function by utilizing auxiliary variables to represent the composition structures and treating these representations as constraints. We prove the existence of global solutions and Karush-Kuhn-Tucker (KKT) points of the constrained problem. Moreover, we propose an ALM and design an efficient block coordinate descent (BCD) method to solve the subproblems of the ALM. The update of each block of the BCD method has a closed-form solution. The stop criterion for the inner loop is easy to check and can be satisfied in finite steps. Moreover, we demonstrate that any accumulation point of the sequences generated by the BCD method is a directional stationary point of the subproblem. Furthermore, we establish the global convergence of the ALM to a KKT point of the constrained optimization problem. Compared with state-of-the-art algorithms, numerical results demonstrate the efficiency and effectiveness of the ALM for training RNNs on synthetic datasets, the MNIST handwritten digit recognition task, the TIMIT audio denoising task, and the volatility of S&P index forecasting task.
In the second part of the thesis, we investigate the convergence of minimizers and stationary points of the SAA problems. Specifically, we first establish the existence of optimal solutions for both the original problem and the SAA problems. Next, we prove that any accumulation point of the sequences of minimizers and stationary points of the SAA problems is, respectively, a minimizer and a stationary point of the original problem with probability one, as the sample size goes to infinity. We also derive the uniform exponential rates of convergence of the objective functions of the SAA problems to those of the original problem.
Rights: All rights reserved
Access: open access

Files in This Item:
File Description SizeFormat 
8224.pdfFor All Users1.15 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/13783