Low-rank Bandits with Latent Mixtures

Aditya Gopalan; Odalric-Ambrym Maillard; Mohammadi Zaki

Pré-Publication, Document De Travail Année : 2016

Low-rank Bandits with Latent Mixtures

(1) , (2) , (1)

1
2

Aditya Gopalan

Fonction : Auteur

Indian Institute of Science [Bangalore]

Odalric-Ambrym Maillard

Fonction : Auteur
PersonId : 5563
IdHAL : odalric-ambrym-maillard
ORCID : 0000-0001-7935-7026
IdRef : 158055594

Machine Learning and Optimisation

Mohammadi Zaki

Fonction : Auteur

Indian Institute of Science [Bangalore]

Résumé

We study the task of maximizing rewards from recommending items (actions) to users sequentially interacting with a recommender system. Users are modeled as latent mixtures of C many representative user classes, where each class specifies a mean reward profile across actions. Both the user features (mixture distribution over classes) and the item features (mean reward vector per class) are unknown a priori. The user identity is the only contextual information available to the learner while interacting. This induces a low-rank structure on the matrix of expected rewards r a,b from recommending item a to user b. The problem reduces to the well-known linear bandit when either user-or item-side features are perfectly known. In the setting where each user, with its stochastically sampled taste profile, interacts only for a small number of sessions, we develop a bandit algorithm for the two-sided uncertainty. It combines the Robust Tensor Power Method of Anandkumar et al. (2014b) with the OFUL linear bandit algorithm of Abbasi-Yadkori et al. (2011). We provide the first rigorous regret analysis of this combination, showing that its regret after T user interactions is˜O(C √ BT), with B the number of users. An ingredient towards this result is a novel robustness property of OFUL, of independent interest.

Mots clés

online learning reinforcement learning recommender systems low-rank matrices Multi-armed bandits

Domaines

Apprentissage [cs.LG]

Fichier principal

1609.01508.pdf (425.74 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Odalric-Ambrym Maillard : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01400318

Soumis le : lundi 21 novembre 2016-17:28:45

Dernière modification le : mardi 13 février 2024-03:25:13

Archivage à long terme le : lundi 20 mars 2017-21:06:22

Dates et versions

hal-01400318 , version 1 (21-11-2016)

Identifiants

HAL Id : hal-01400318 , version 1
ARXIV : 1609.01508

Citer

Aditya Gopalan, Odalric-Ambrym Maillard, Mohammadi Zaki. Low-rank Bandits with Latent Mixtures. 2016. ⟨hal-01400318⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UMR8623 CENTRALESUPELEC INRIA2 LRI-AO UNIV-PARIS-SACLAY GS-COMPUTER-SCIENCE

430 Consultations

119 Téléchargements

Low-rank Bandits with Latent Mixtures

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager