S. Agrawal and N. Goyal, Thompson sampling for contextual bandits with linear payoffs, ICML (3), pp.127-135, 2013.

D. Bouneffouf, DRARS, A Dynamic Risk-Aware Recommender System, 2013.
URL : https://hal.archives-ouvertes.fr/tel-01026136

D. Bouneffouf, A. Bouzeghoub, and A. L. Gançarski, A Contextual-Bandit Algorithm for Mobile Context-Aware Recommender System, ICONIP (3), pp.324-331, 2012.
DOI : 10.1007/978-3-642-34487-9_40

URL : https://hal.archives-ouvertes.fr/hal-00753401

O. Chapelle and L. Li, An empirical evaluation of thompson sampling, NIPS, pp.2249-2257, 2011.

R. Ganti and A. G. Gray, Building bridges: Viewing active learning from the multi-armed bandit lens. CoRR, abs/1309, 2013.

D. D. Lewis and W. A. Gale, A sequential algorithm for training text classifiers, Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '94, pp.3-12, 1994.
DOI : 10.1145/219587.219592

URL : http://arxiv.org/abs/cmp-lg/9407020

T. Osugi, D. Kim, and S. Scott, Balancing Exploration and Exploitation: A New Algorithm for Active Machine Learning, Fifth IEEE International Conference on Data Mining (ICDM'05), 2005.
DOI : 10.1109/ICDM.2005.33

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.452.9912

B. Settles, Active Learning Literature Survey, 2009.

H. S. Seung, M. Opper, and H. Sompolinsky, Query by committee, Proceedings of the fifth annual workshop on Computational learning theory , COLT '92, pp.287-294, 1992.
DOI : 10.1145/130385.130417

T. Zhang and F. J. Oles, A probability analysis on the value of unlabeled data for classification problems, 17th International Conference on Machine Learning, 2000.

X. Zhu, J. Lafferty, and Z. Ghahramani, Combining active learning and semisupervised learning using gaussian fields and harmonic functions, ICML 2003 workshop on The Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, pp.58-65, 2003.