화학공학소재연구정보센터
SIAM Journal on Control and Optimization, Vol.53, No.4, 1982-2016, 2015
ON CONVERGENCE OF VALUE ITERATION FOR A CLASS OF TOTAL COST MARKOV DECISION PROCESSES
We consider a general class of total cost Markov decision processes (MDP) in which the one-stage costs can have arbitrary signs, but the sum of the negative parts of the one-stage costs is finite for all policies and all initial states. This class, which we refer to as the general convergence (GC) total cost model, contains several important subclasses of problems, e.g., positive costs problems, bounded negative costs problems, and discounted problems with unbounded one-stage costs. We study the convergence of value iteration for the (GC) model, in the Borel MDP framework with universally measurable policies. Our main results include (i) convergence of value iteration when starting from certain functions above the optimal cost function; (ii) convergence of transfinite value iteration starting from zero, as well as convergence of ordinary nontransfinite value iteration for finite control or certain finite state (GC) problems, in the special case where the optimal cost function is nonnegative; and (iii) partial convergence of value iteration starting from zero, for a subset of initial states. These results extend several previously known results about the convergence of value iteration for either positive costs problems or (GC) total cost problems.