IEEE Transactions on Automatic Control, Vol.64, No.9, 3756-3763, 2019
Primal-Dual Q-Learning Framework for LQR Design
Recently, reinforcement learning (RL) is receiving more and more attentions due to its successful demonstrations outperforming human performance in certain challenging tasks. The goal of this paper is to study a new optimization formulation of the linear quadratic regulator (LQR) problem via the Lagrangian duality theories in order to lay theoretical foundations of potentially effective RL algorithms. The new optimization problem includes the Q-function parameters so that it can be directly used to develop Q-learning algorithms, known to be one of the most popular RL algorithms. We prove relations between saddle-points of the Lagrangian function and the optimal solutions of the Bellman equation. As an example of its applications, we propose a model-free primal-dual Q-learning algorithm to solve the LQR problem and demonstrate its validity through examples.
Keywords:Linear quadratic regulator (LQR);optimal control;reinforcement learning;Q-learning;linear time invariant (LTI) system;duality