Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Reinforcement Learning Algorithms: Acceleration Design and Non-asymptotic Theory

Abstract Details

2021, Doctor of Philosophy, Ohio State University, Electrical and Computer Engineering.
Reinforcement learning (RL) aims to design strategies for an agent to find a desirable policy through interacting with an environment in order to maximize an accumulative reward for a task. RL has received drastically growing attention in recent years and accomplished tremendous success in various application domains. Yet, existing RL algorithms can still suffer from poor performance such as slow convergence, high variance, suboptimality, etc. As a result, important implementation techniques have been developed to overcome these issues, including the momentum schemes, the adaptive moment estimation (Adam) method, and the double Q-learning architecture. Thus, the first focus of this thesis is on further development of novel techniques to improve the performance of RL algorithms. Furthermore, in contrast to the wide applications of these techniques in RL, the understanding of their theoretical performance is rather limited. Thus, the second focus of this dissertation is on providing convergence analysis for RL algorithms that adopt the popular acceleration techniques including 1) momentum schemes, 2) Adam-type schemes, and 3) double Q-learning architecture. First, we propose a new momentum scheme and apply it to Q-learning under both the tabular and function approximation settings. We further show that the proposed momentum-based Q-learning algorithms achieve a better convergence rate than the existing momentum type Q-learning. Our experimental results also show that the proposed algorithms outperform other existing algorithms of the same type. Second, we establish the first convergence guarantee for Adam-type RL algorithms. In particular, we apply AMSGrad (a convergent variant of Adam) updates to temporal difference (TD) learning and policy gradient (PG) and establish their finite-time results under a practical Markovian sampling strategy. In addition, we also study Q-learning with AMSGrad updates and propose a new Adam-type Q-learning algorithm which adopts the momentum restart technique. Our numerical results show that the proposed Adam-type Q-learning algorithm outperforms the popular deep Q-network (DQN) method. Last, we provide the first finite-time convergence analysis for double Q-learning. We model the double Q-learning algorithm as a nested stochastic approximation (SA) problem, and develop new techniques to analyze it. We provide the non-asymptotic convergence results for both synchronous and asynchronous double Q-learning under polynomial learning rates, and further improve the results by adopting constant learning rates.
Yingbin Liang (Advisor)
Wei Zhang (Committee Member)
Philip Schniter (Committee Member)
Hesham El Gamal (Committee Member)
262 p.

Recommended Citations

Citations

  • Xiong, H. (2021). Reinforcement Learning Algorithms: Acceleration Design and Non-asymptotic Theory [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1617960067796074

    APA Style (7th edition)

  • Xiong, Huaqing. Reinforcement Learning Algorithms: Acceleration Design and Non-asymptotic Theory. 2021. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1617960067796074.

    MLA Style (8th edition)

  • Xiong, Huaqing. "Reinforcement Learning Algorithms: Acceleration Design and Non-asymptotic Theory." Doctoral dissertation, Ohio State University, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=osu1617960067796074

    Chicago Manual of Style (17th edition)