Offline A/B testing for recommender systems - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2018

Offline A/B testing for recommender systems

Alexandre Gilotte
  • Fonction : Auteur
Clément Calauzènes
Thomas Nedelec
Alexandre Abraham
Simon Dollé
  • Fonction : Auteur

Résumé

Online A/B testing evaluates the impact of a new technology byrunning it in a real production environment and testing its performance on a subset of the users of the platform. It is a well-known practice to run a preliminary offline evaluation on historical data to iterate faster on new ideas, and to detect poor policies in order to avoid losing money or breaking the system. For such offline evaluations, we are interested in methods that can compute offline an estimate of the potential uplift of performance generated by anew technology. Offline performance can be measured using estimators known as counterfactual or off-policy estimators. Traditional counterfactual estimators, such as capped importance sampling or normalised importance sampling, exhibit unsatisfying bias-variance compromises when experimenting on personalized product recommendation systems. To overcome this issue, we model the bias incurred by these estimators rather than bound it in the worst case, which leads us to propose a new counterfactual estimator. We provide a benchmark of the different estimators showing their correlation with business metrics observed by running online A/B tests on a large-scale commercial recommender system.
Fichier principal
Vignette du fichier
1801.07030.pdf (1.1 Mo) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte

Dates et versions

hal-02457457 , version 1 (26-01-2024)

Identifiants

Citer

Alexandre Gilotte, Clément Calauzènes, Thomas Nedelec, Alexandre Abraham, Simon Dollé. Offline A/B testing for recommender systems. WSDM '18 - The 11th ACM International Conference on Web Search and Data Mining, Feb 2018, Los Angeles, United States. pp.198-206, ⟨10.1145/3159652.3159687⟩. ⟨hal-02457457⟩

Collections

GS-ENGINEERING
311 Consultations
1 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More