Whittle Index for Partially Observed Binary Markov Decision Processes

Borkar VS

IEEE Transactions on Automatic Control, Vol.62, No.12, 6614-6618, 2017

DOI10.1109/TAC.2017.2715329 Export Citation

Whittle Index for Partially Observed Binary Markov Decision Processes

We consider the problem of dynamically scheduling M out of N binary Markov chains when only noisy observations of state are available, with ergodic (equivalently, long run average) reward. By passing on to the equivalent problem of controlling the conditional distribution of state given observations and controls, it is cast as a restless bandit problem and its Whittle indexability is established.

Keywords:Binary bandits;dynamic programming;ergodic cost;partially observed bandits;threshold policies;Whittle index