Weighted Restless Bandit and Its Applications

Document Type

Conference Proceeding

Publication Date

1-1-2015

Abstract

© 2015 IEEE. Motivated by many applications such as cognitive radio spectrum scheduling, downlink fading channel scheduling, and unmanned aerial vehicle dynamic routing, we study two restless bandit problems. Given a bandit consisting of multiple restless arms, the state of each arm evolves as a Markov chain. Assume each arm is associated with a positive weight. At each step, we select a subset of arms to play such that the weighted sum of the selected arms cannot exceed a limit. The reward of playing each arm varies according to the arm's state. The exact state of each arm is only revealed when the arm is played. The problem weighted restless bandit aims to maximize the expected average reward over the infinite horizon. We also study an extended problem called multiply-constrained restless bandit where each time there are two simultaneous constraints on the selected arms. First, the weighted sum of the selected arms cannot exceed a limit, Second, the number of the selected arms is at most a constant K. The objective of multiply-constrained restless bandit is to maximize the long term average reward. Both problems are partially observable Markov decision processes and have been proved to be PSPACE-hard even in their special cases. We propose constant approximation algorithms for both problems. Our method involves solving a semi-infinite program, converting back to a low-complexity policy, and accounting for the average reward via a Lyapunov function analysis.

Publication Title

Proceedings - International Conference on Distributed Computing Systems

Share

COinS