화학공학소재연구정보센터
IEEE Transactions on Automatic Control, Vol.63, No.7, 2280-2286, 2018
The Multi-Armed Bandit With Stochastic Plays
We extend the stochastic multi-armed bandit to the case where the number of arms to play evolves as a stationary process. Our work is motivated by demand response in power systems, in which the number of arms to play, or loads to dispatch, depends on a random power imbalance. We give an upper confidence bound-based algorithm that achieves sublinear pseudo-regret. We apply our results in several examples from demand response.