- Previous Article
- Next Article
- Table of Contents
IEEE Transactions on Automatic Control, Vol.63, No.4, 915-930, 2018
Stochastic Online Shortest Path Routing: The Value of Feedback
This paper studies online shortest path routing over multihop networks. Link costs or delays are time varying and modeled by independent and identically distributed random processes, whose parameters are initially unknown. The parameters, and hence the optimal path, can only be estimated by routing packets through the network and observing the realized delays. Our aim is to find a routing policy that minimizes the regret (the cumulative difference of expected delay) between the path chosen by the policy and the unknown optimal path. We formulate the problem as a combinatorial bandit optimization problem and consider several scenarios that differ in where routing decisions are made and in the information available when making the decisions. For each scenario, we derive a tight asymptotic lower bound on the regret that has to be satisfied by any online routing policy. Three algorithms, with a tradeoff between computational complexity and performance, are proposed. The regret upper bounds of these algorithms improve over those of the existing algorithms. We also assess numerically the performance of the proposed algorithms and compare it to that of existing algorithms.
Keywords:Online combinatorial optimization;shortest path routing;stochastic multiarmed bandits (MAB)