Asymptotic optimal control of multi-class restless bandits

Ina Maria Maaike Verloop

Pré-Publication, Document De Travail Année : 2012

Asymptotic optimal control of multi-class restless bandits

(1)

Ina Maria Maaike Verloop

Fonction : Auteur correspondant
PersonId : 738383
IdHAL : maaike-verloop
IdRef : 188434208

Connectez-vous pour contacter l'auteur

Institut de recherche en informatique de Toulouse

Résumé

We study the asymptotic optimal control of multi-class restless bandits. A restless bandit is a controllable process whose state evolution depends on whether or not the bandit is made active. The aim is to find a control that determines at each decision epoch which bandits to make active in order to minimize the overall average cost associated to the states the bandits are in. Since finding the optimal control is typically intractable, we study an asymptotic regime instead that is obtained by letting the number of bandits that can be simultaneously made active grow proportionally with the population of bandits. We consider both a fixed population of bandits as well as a dynamic population of bandits where bandits can depart and new bandits can arrive over time to the system. We propose a class of priority policies, obtained by solving a linear program, that are proved to be asymptotically optimal under certain technical conditions. Indexability of the bandits is not required for the result to hold. For a fixed population of bandits, the technical conditions reduce to checking that a differential equation has a global attractor. For a dynamic population of bandits additional conditions are needed due to the infinite state space. In case the bandits are indexable, we prove that Whittle's index policy is included in the class of asymptotically optimal policies. This generalizes the result of Weber and Weiss (1990) who showed asymptotic optimality of Whittle's index policy for a symmetric fixed population of bandits, to the setting of (i) several classes of bandits and (ii) possible arrivals of new bandits. In order to prove the main results we combine fluid-scaling techniques with linear programming results. This is a different proof approach than that taken in Weber and Weiss, and, in contrary to the latter, allows to include arrivals of new bandits to the system. Finally we present a case study of impatient bandits: We show that the technical conditions related to the infinite state space are always satisfied and, hence, asymptotic optimality can be concluded once the global attractor property is proved. For the special case of a multi-class M/M/S queue with impatient bandits the latter is satisfied and henceforth we can derive an asymptotically optimal index policy.

Mots clés

Restless bandits asymptotic optimality Whittle's index policy arm-aquiring bandits

Domaines

Optimisation et contrôle [math.OC]

Fichier principal

Asym_opt_bandits.pdf (279.95 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Ina Maria Verloop : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00743781

Soumis le : lundi 22 octobre 2012-14:27:34

Dernière modification le : lundi 20 novembre 2023-11:44:19

Archivage à long terme le : mercredi 23 janvier 2013-03:35:32

Dates et versions

hal-00743781 , version 1 (22-10-2012)

hal-00743781 , version 2 (03-09-2013)

hal-00743781 , version 3 (07-07-2014)

hal-00743781 , version 4 (01-04-2015)

hal-00743781 , version 5 (09-09-2015)

hal-00743781 , version 6 (29-02-2016)

Identifiants

HAL Id : hal-00743781 , version 1

Citer

Ina Maria Maaike Verloop. Asymptotic optimal control of multi-class restless bandits. 2012. ⟨hal-00743781v1⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

516 Consultations

895 Téléchargements

Asymptotic optimal control of multi-class restless bandits

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Partager