Asymptotic optimal control of multi-class restless bandits - Archive ouverte HAL Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2012

Asymptotic optimal control of multi-class restless bandits

Résumé

We study the asymptotic optimal control of multi-class restless bandits. A restless bandit is a controllable process whose state evolution depends on whether or not the bandit is made active. The aim is to find a control that determines at each decision epoch which bandits to make active in order to minimize the overall average cost associated to the states the bandits are in. Since finding the optimal control is typically intractable, we study an asymptotic regime instead that is obtained by letting the number of bandits that can be simultaneously made active grow proportionally with the population of bandits. We consider both a fixed population of bandits as well as a dynamic population of bandits where bandits can depart and new bandits can arrive over time to the system. We propose a class of priority policies, obtained by solving a linear program, that are proved to be asymptotically optimal under certain technical conditions. Indexability of the bandits is not required for the result to hold. For a fixed population of bandits, the technical conditions reduce to checking that a differential equation has a global attractor. For a dynamic population of bandits additional conditions are needed due to the infinite state space. In case the bandits are indexable, we prove that Whittle's index policy is included in the class of asymptotically optimal policies. This generalizes the result of Weber and Weiss (1990) who showed asymptotic optimality of Whittle's index policy for a symmetric fixed population of bandits, to the setting of (i) several classes of bandits and (ii) possible arrivals of new bandits. In order to prove the main results we combine fluid-scaling techniques with linear programming results. This is a different proof approach than that taken in Weber and Weiss, and, in contrary to the latter, allows to include arrivals of new bandits to the system. Finally we present a case study of impatient bandits: We show that the technical conditions related to the infinite state space are always satisfied and, hence, asymptotic optimality can be concluded once the global attractor property is proved. For the special case of a multi-class M/M/S queue with impatient bandits the latter is satisfied and henceforth we can derive an asymptotically optimal index policy.
Fichier principal
Vignette du fichier
Asym_opt_bandits.pdf (279.95 Ko) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte

Dates et versions

hal-00743781 , version 1 (22-10-2012)
hal-00743781 , version 2 (03-09-2013)
hal-00743781 , version 3 (07-07-2014)
hal-00743781 , version 4 (01-04-2015)
hal-00743781 , version 5 (09-09-2015)
hal-00743781 , version 6 (29-02-2016)

Identifiants

  • HAL Id : hal-00743781 , version 1

Citer

Ina Maria Maaike Verloop. Asymptotic optimal control of multi-class restless bandits. 2012. ⟨hal-00743781v1⟩
516 Consultations
895 Téléchargements

Partager

Gmail Facebook X LinkedIn More