The Multi-Armed Bandit With Stochastic Plays

Lesage-Landry A; Taylor JA

IEEE Transactions on Automatic Control, Vol.63, No.7, 2280-2286, 2018

DOI10.1109/TAC.2017.2765501 Export Citation

The Multi-Armed Bandit With Stochastic Plays

We extend the stochastic multi-armed bandit to the case where the number of arms to play evolves as a stationary process. Our work is motivated by demand response in power systems, in which the number of arms to play, or loads to dispatch, depends on a random power imbalance. We give an upper confidence bound-based algorithm that achieves sublinear pseudo-regret. We apply our results in several examples from demand response.

Keywords:Demand response;multi-armed bandit (MAB);online learning;stochastic bandit