New algorithms of the Q-learning type

Bhatnagar S; Babu KM

Automatica, Vol.44, No.4, 1111-1119, 2008

DOI10.1016/j.automatica.2007.09.009 Export Citation

New algorithms of the Q-learning type

We propose two algorithms for Q-learning that use the two-timescale stochastic approximation methodology. The first of these updates Q-values of all feasible state-action pairs at each instant while the second updates Q-values of states with actions chosen according to the 'current' randomized policy updates. A proof of convergence of the algorithms is shown. Finally, numerical experiments using the proposed algorithms on an application of routing in communication networks are presented on a few different settings. (c) 2007 Elsevier Ltd. All rights reserved.

Keywords:Q-learning;reinforcement learning;Markov decision processes;two-timescale stochastic approximation;SPSA