화학공학소재연구정보센터
IEEE Transactions on Automatic Control, Vol.62, No.12, 6415-6422, 2017
Thompson Sampling for Stochastic Control: The Finite Parameter Case
In this paper, we apply Thompson sampling to a class of average reward stochastic control problems with parameter uncertainty. Specifically, we study an average reward stochastic control problem over an infinite horizon in which both the reward and state transition distributions are parameterized by an unknown parameter taking values in a finite space. The main result of this paper is a proof showing that Thompson sampling achieves a worst case average per period regret of O(T-1), which is asymptotically optimal.