IEEE Transactions on Automatic Control, Vol.62, No.12, 6614-6618, 2017
Whittle Index for Partially Observed Binary Markov Decision Processes
We consider the problem of dynamically scheduling M out of N binary Markov chains when only noisy observations of state are available, with ergodic (equivalently, long run average) reward. By passing on to the equivalent problem of controlling the conditional distribution of state given observations and controls, it is cast as a restless bandit problem and its Whittle indexability is established.
Keywords:Binary bandits;dynamic programming;ergodic cost;partially observed bandits;threshold policies;Whittle index