Online shortest path routing: The value of information, Proceedings of American Control Conference (ACC), 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00920068
Adaptive routing with end-to-end feedback: Distributed learning and geometric approaches, Proceedings of the 36th Annual ACM Symposium on Theory of Computing (STOC), pp.45-53, 2004. ,
Adaptive routing using expert advice, The Computer Journal, vol.49, issue.2, pp.180-189, 2006. ,
The on-line shortest path problem under partial monitoring, Journal of Machine Learning Research, vol.8, pp.2369-2403, 2007. ,
Endhost-based shortest path routing in dynamic networks, Proceedings of the 32nd IEEE International Conference on Computer Communications (INFOCOM), pp.2202-2210, 2013. ,
Big data for autonomic intercontinental overlays, IEEE Journal on Selected Areas in Communications, vol.34, issue.3, pp.575-583, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01461990
Combinatorial bandits, Journal of Computer and System Sciences, vol.78, issue.5, pp.1404-1422, 2012. ,
Finite-time analysis of the multiarmed bandit problem, Machine Learning, vol.47, pp.235-256, 2002. ,
Combinatorial multi-armed bandit: General framework and applications, Proceedings of the 30th International Conference on Machine Learning (ICML), pp.151-159, 2013. ,
Thompson sampling for complex online problems, Proceedings of the 31st International Conference on Machine Learning (ICML), pp.100-108, 2014. ,
Some aspects of the sequential design of experiments, Bulletin of the American Mathematical Society, vol.58, issue.5, pp.527-535, 1952. ,
Asymptotically efficient adaptive allocation rules, Advances in applied mathematics, vol.6, issue.1, pp.4-22, 1985. ,
Regret in online combinatorial optimization, Mathematics of Operations Research, vol.39, issue.1, pp.31-45, 2014. ,
Towards minimax policies for online linear optimization with bandit feedback, Proceedings of the 25th Conference On Learning Theory (COLT), 2012. ,
An efficient algorithm for learning with semibandit feedback, Algorithmic Learning Theory, pp.234-248, 2013. ,
Combinatorial network optimization with unknown variables: Multi-armed bandits with linear rewards and individual observations, IEEE/ACM Transactions on Networking, vol.20, issue.5, pp.1466-1478, 2012. ,
Tight regret bounds for stochastic combinatorial semi-bandits, Proceedings of the 18th International Conference on Artificial Intelligence and Statistics (AISTATS), 2015. ,
Combinatorial bandits revisited, Advances in Neural Information Processing Systems (NIPS), 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01257796
Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part I: IID rewards, IEEE Transactions on Automatic Control, vol.32, issue.11, pp.968-976, 1987. ,
Matroid bandits: Fast combinatorial optimization with learning, Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence (UAI), 2014. ,
Learning multiuser channel allocations in cognitive radio networks: A combinatorial multi-armed bandit formulation, Proceedings of Symposium on New Frontiers in Dynamic Spectrum (DySPAN), 2010. ,
Efficient learning in large-scale combinatorial semi-bandits, Proceedings of the 32nd International Conference on Machine Learning (ICML), pp.1113-1122, 2015. ,
On the likelihood that one unknown probability exceeds another in view of the evidence of two samples, Biometrika, vol.25, issue.3/4, pp.285-294, 1933. ,
Adaptive shortest-path routing under unknown and stochastically varying link states, Proceedings of the 10th International Symposium on Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks (WiOpt), pp.232-237, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00763780
Distributed online learning of the shortest path under unknown random edge weights, Proceedings of the 38th International Conference on Acoustics, Speech, and Signal Processing, pp.3138-3142, 2013. ,
Optimal adaptive policies for Markov decision processes, Mathematics of Operations Research, vol.22, issue.1, pp.222-255, 1997. ,
, Markov Decision Processes: Discrete Stochastic Dynamic Programming, 2005.
Near-optimal regret bounds for reinforcement learning, The Journal of Machine Learning Research, vol.99, pp.1563-1600, 2010. ,
Optimism in reinforcement learning and Kullback-Leibler divergence, Proceedings of the 48th Annual Allerton Conference on Communication, Control, and Computing, pp.115-122, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-00476116
Asymptotically efficient adaptive choice of control laws in controlled Markov chains, SIAM Journal on Control and Optimization, vol.35, issue.3, pp.715-743, 1997. ,
Convolution of geometrics and a reliability problem, Statistics & Probability Letters, vol.43, issue.4, pp.421-426, 1999. ,
Semi-infinite programming, duality, discretization and optimality conditions, Optimization, vol.58, issue.2, pp.133-161, 2009. ,
On upper-confidence bound policies for non-stationary bandit problems, 2008. ,
URL : https://hal.archives-ouvertes.fr/hal-00281392
Unimodal bandits: Regret lower bounds and optimal algorithms, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01092662
Online learning under delayed feedback, Proceedings of the 30th International Conference on Machine Learning (ICML), pp.1453-1461, 2013. ,
Lipschitz bandits: Regret lower bounds and optimal algorithms, Proceedings of the 27th Conference on Learning Theory (COLT), 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01092791
Explore first, exploit next: The true shape of regret in bandit problems, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01276324
The KL-UCB algorithm for bounded stochastic bandits and beyond, Proceedings of the 24th Conference On Learning Theory (COLT), 2011. ,