Competing in the dark: An efficient algorithm for bandit linear optimization, Proceedings of the 21st Annual Conference on Learning Theory (COLT), pp.263-274, 2008. ,
Regret in online combinatorial optimization Mathematics of Operations Research, 2014. ,
Online learning. Lecture notes, 2011. ,
Convex Optimization, 2004. ,
Hierarchical relative entropy policy search, Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics Conference Proceedings, pp.273-281, 2012. ,
Better rates for any adversarial deterministic mdp, Proceedings of the 30th International Conference on Machine Learning (ICML-13) Conference Proceedings, pp.675-683, 2013. ,
Experts in a Markov decision process, NIPS-17, pp.401-408, 2005. ,
Online Markov Decision Processes, Mathematics of Operations Research, vol.34, issue.3, pp.726-736, 2009. ,
DOI : 10.1287/moor.1090.0396
The on-line shortest path problem under partial monitoring, Journal of Machine Learning Research, vol.8, pp.2369-2403, 2007. ,
A natural policy gradient, Advances in Neural Information Processing Systems 14 (NIPS), pp.1531-1538, 2001. ,
Hedging structured concepts, Proceedings of the 23rd Annual Conference on Learning Theory (COLT), pp.93-105, 2010. ,
Régularisation d'inéquations variationnelles par approximations successives, ESAIM: Mathematical Modelling and Numerical Analysis -Modélisation Mathématique et Analyse Numérique, vol.4, issue.R3, pp.154-158, 1970. ,
The online loop-free stochastic shortestpath problem, Proceedings of the 23rd Annual Conference on Learning Theory (COLT), pp.231-243, 2010. ,
The adversarial stochastic shortest path problem with unknown transition probabilities, AISTATS 2012, pp.805-813, 2012. ,
Online Markov Decision Processes Under Bandit Feedback, NIPS-23, pp.1804-1812, 2010. ,
DOI : 10.1109/TAC.2013.2292137
URL : https://hal.archives-ouvertes.fr/hal-01079422
Relative entropy policy search, AAAI 2010, pp.1607-1612, 2010. ,
Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1994. ,
DOI : 10.1002/9780470316887
Lecture notes on online learning, 2009. ,
Monotone Operators and the Proximal Point Algorithm, SIAM Journal on Control and Optimization, vol.14, issue.5, pp.877-898, 1976. ,
DOI : 10.1137/0314056
Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998. ,
DOI : 10.1109/TNN.1998.712192
Algorithms for Reinforcement Learning, Synthesis Lectures on Artificial Intelligence and Machine Learning, vol.4, issue.1, 2010. ,
DOI : 10.2200/S00268ED1V01Y201005AIM009
Markov Decision Processes with Arbitrary Reward Processes, Mathematics of Operations Research, vol.34, issue.3, pp.737-757, 2009. ,
DOI : 10.1287/moor.1090.0397
Online convex programming and generalized infinitesimal gradient ascent, Proceedings of the Twentieth International Conference on Machine Learning, pp.928-936, 2003. ,