N. Abe, P. Melville, C. Pendus, K. Chandan, D. L. Reddy et al., Optimizing debt collections using constrained reinforcement learning, Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2010.

J. Achiam, D. Held, A. Tamar, and P. Abbeel, Constrained policy optimization. CoRR, abs/1705.10528, 2017.

E. Altman, Constrained Markov Decision Processes, 1999.
URL : https://hal.archives-ouvertes.fr/inria-00074109

R. Bellman, Dynamic programming and lagrange multipliers, Proceedings of the National Academy of Sciences of the United States of America, 1956.

J. Frederick, K. W. Beutler, and . Ross, Optimal policies for controlled markov chains with a constraint, Journal of Mathematical Analysis and Applications, vol.112, issue.1, pp.236-252, 1985.

C. Boutilier and T. Lu, Budget allocation using weakly coupled, constrained markov decision processes, Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence (UAI), 2016.

Y. Chow, M. Ghavamzadeh, L. Janson, and M. Pavone, Riskconstrained reinforcement learning with percentile risk criteria. CoRR, abs/1512.01629, 2015.

D. Ernst, P. Geurts, and L. Wehenkel, Tree-Based Batch Mode Reinforcement Learning, Journal of Machine Learning Research, 2005.

J. García and F. Fernández, A Comprehensive Survey on Safe Reinforcement Learning, Journal of Machine Learning Research, 2015.

P. Geibel and F. Wysotzki, Risk-sensitive reinforcement learning applied to control under constraints, Journal of Artificial Intelligence, vol.24, pp.81-108, 2005.

R. L. Graham, An efficient algorithm for determining the convex hull of a finite planar set, Inf. Process. Lett, 1972.

D. Kraft and K. Schnepper, Slsqp, a nonlinear programming method with quadratic programming subproblems, DLR, 1989.

R. Laroche and P. Trichelair, Safe Policy Improvement with Baseline Bootstrapping, 2017.

M. Petrik, M. Ghavamzadeh, and Y. Chow, Safe policy improvement by minimizing robust baseline regret, Advances in Neural Information Processing Systems (NIPS), 2016.