Asymptotic optimal control of multi-class restless bandits

Ina Maria Maaike Verloop

Pré-Publication, Document De Travail Année : 2013

Asymptotic optimal control of multi-class restless bandits

(1)

Ina Maria Maaike Verloop

Fonction : Auteur correspondant
PersonId : 738383
IdHAL : maaike-verloop
IdRef : 188434208

Connectez-vous pour contacter l'auteur

Institut de recherche en informatique de Toulouse

Résumé

We study the asymptotic optimal control of multi-class restless bandits. A restless bandit is a controllable process whose state evolution depends on whether or not the bandit is made active. The aim is to find a control that determines at each decision epoch which bandits to make active in order to minimize the overall average cost associated to the states the bandits are in. Since finding the optimal control is typically intractable, we study an asymptotic regime instead that is obtained by letting the number of bandits that can be simultaneously made active grow proportionally with the population of bandits. We consider both a fixed population of bandits as well as a dynamic population of bandits where bandits can depart and new bandits can arrive over time to the system. We propose a class of priority policies, obtained by solving a linear program, that are proved to be asymptotically optimal under a global attractor property and a technical condition. Indexability of the bandits is not required for the result to hold. For a fixed population of bandits, the technical condition reduces to checking a unichain property. For a dynamic population of bandits we present a large class of restless bandit problems for which the technical condition is always satis fied. As an example, we present a multi-class M/M/S+M queue, which is inside this class of problems and satis es the global attractor property. Henceforth asymptotic optimality of an index policy follows. In case the bandits are indexable, we prove that Whittle's index policy is included in the class of asymptotically optimal policies. This generalizes the result of Weber and Weiss (1990), who showed asymptotic optimality of Whittle's index policy for a symmetric fixed population of bandits, to the setting of (i) several classes of bandits, (ii) multiple actions, and (iii) possible arrivals of new bandits. In order to prove the main results we combine fluid-scaling techniques with linear programming results. This is a different proof approach than that taken in Weber and Weiss, and, in contrary to the latter, allows to include arrivals of new bandits to the system.

Mots clés

Restless bandits asymptotic optimality Whittle's index policy arm-aquiring bandits

Domaines

Optimisation et contrôle [math.OC]

Fichier principal

RBP.pdf (459.75 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Ina Maria Verloop : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00743781

Soumis le : mardi 3 septembre 2013-14:09:21

Dernière modification le : lundi 20 novembre 2023-11:44:19

Archivage à long terme le : jeudi 6 avril 2017-10:32:51

Dates et versions

hal-00743781 , version 1 (22-10-2012)

hal-00743781 , version 2 (03-09-2013)

hal-00743781 , version 3 (07-07-2014)

hal-00743781 , version 4 (01-04-2015)

hal-00743781 , version 5 (09-09-2015)

hal-00743781 , version 6 (29-02-2016)

Identifiants

HAL Id : hal-00743781 , version 2

Citer

Ina Maria Maaike Verloop. Asymptotic optimal control of multi-class restless bandits. 2013. ⟨hal-00743781v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

517 Consultations

896 Téléchargements

Asymptotic optimal control of multi-class restless bandits

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Partager