Asymptotic optimal control of multi-class restless bandits - Archive ouverte HAL Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2013

Asymptotic optimal control of multi-class restless bandits

Résumé

We study the asymptotic optimal control of multi-class restless bandits. A restless bandit is a controllable process whose state evolution depends on whether or not the bandit is made active. The aim is to find a control that determines at each decision epoch which bandits to make active in order to minimize the overall average cost associated to the states the bandits are in. Since finding the optimal control is typically intractable, we study an asymptotic regime instead that is obtained by letting the number of bandits that can be simultaneously made active grow proportionally with the population of bandits. We consider both a fixed population of bandits as well as a dynamic population of bandits where bandits can depart and new bandits can arrive over time to the system. We propose a class of priority policies, obtained by solving a linear program, that are proved to be asymptotically optimal under a global attractor property and a technical condition. Indexability of the bandits is not required for the result to hold. For a fixed population of bandits, the technical condition reduces to checking a unichain property. For a dynamic population of bandits we present a large class of restless bandit problems for which the technical condition is always satis fied. As an example, we present a multi-class M/M/S+M queue, which is inside this class of problems and satis es the global attractor property. Henceforth asymptotic optimality of an index policy follows. In case the bandits are indexable, we prove that Whittle's index policy is included in the class of asymptotically optimal policies. This generalizes the result of Weber and Weiss (1990), who showed asymptotic optimality of Whittle's index policy for a symmetric fixed population of bandits, to the setting of (i) several classes of bandits, (ii) multiple actions, and (iii) possible arrivals of new bandits. In order to prove the main results we combine fluid-scaling techniques with linear programming results. This is a different proof approach than that taken in Weber and Weiss, and, in contrary to the latter, allows to include arrivals of new bandits to the system.
Fichier principal
Vignette du fichier
RBP.pdf (459.75 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-00743781 , version 1 (22-10-2012)
hal-00743781 , version 2 (03-09-2013)
hal-00743781 , version 3 (07-07-2014)
hal-00743781 , version 4 (01-04-2015)
hal-00743781 , version 5 (09-09-2015)
hal-00743781 , version 6 (29-02-2016)

Identifiants

  • HAL Id : hal-00743781 , version 2

Citer

Ina Maria Maaike Verloop. Asymptotic optimal control of multi-class restless bandits. 2013. ⟨hal-00743781v2⟩
517 Consultations
896 Téléchargements

Partager

Gmail Facebook X LinkedIn More