Skip to Main content Skip to Navigation
Conference papers

Offline A/B Testing for Recommender Systems

Abstract : Online A/B testing evaluates the impact of a new technology byrunning it in a real production environment and testing its performance on a subset of the users of the platform. It is a well-known practice to run a preliminary offline evaluation on historical data to iterate faster on new ideas, and to detect poor policies in order to avoid losing money or breaking the system. For such offline evaluations, we are interested in methods that can compute offline an estimate of the potential uplift of performance generated by anew technology. Offline performance can be measured using estimators known as counterfactual or off-policy estimators. Traditional counterfactual estimators, such as capped importance sampling or normalised importance sampling, exhibit unsatisfying bias-variance compromises when experimenting on personalized product recommendation systems. To overcome this issue, we model the bias incurred by these estimators rather than bound it in the worst case, which leads us to propose a new counterfactual estimator. We provide a benchmark of the different estimators showing their correlation with business metrics observed by running online A/B tests on a large-scale commercial recommender system.
Complete list of metadata
Contributor : Clément Calauzènes <>
Submitted on : Tuesday, January 28, 2020 - 10:15:15 AM
Last modification on : Monday, November 16, 2020 - 2:58:04 PM

Links full text




Alexandre Gilotte, Clément Calauzènes, Thomas Nedelec, Alexandre Abraham, Simon Dollé. Offline A/B Testing for Recommender Systems. Eleventh ACM International Conference on Web Search and Data Mining, Feb 2018, Marina Del Rey, United States. pp.198-206, ⟨10.1145/3159652.3159687⟩. ⟨hal-02457457⟩



Record views