Automatica, Vol.49, No.12, 3687-3689, 2013
Policy set iteration for Markov decision processes
This communique presents an algorithm called "policy set iteration" (PSI) for solving infinite horizon discounted Markov decision processes with finite state and action spaces as a simple generalization of policy iteration (PI). PSI generates a monotonically improving sequence of stationary Markovian policies {pi(k)*} based on a set manipulation, as opposed to PI's single policy manipulation, at each iteration k. When the set involved with PSI at k contains N independently generated sample-policies from a given distribution d, the probability that the expected value of any sampled policy from d with respect to an initial state distribution is greater than that of pi(k)* converges to zero with O(N-k) rate. Moreover, PSI converges to an optimal policy no slower than PI in terms of the number of iterations for any d. (C) 2013 Elsevier Ltd. All rights reserved.