L. Bieker, D. Krajzewicz, A. P. Morra, C. Michelacci, C. et al., Traffic simulation for all: a real world traffic scenario from the city of bologna, 2014.

B. Bouzaiene-ayari, C. Cheng, S. Das, R. Fiorillo, P. et al., From single commodity to multiattribute models for locomotive optimization: A comparison of optimal integer programming and approximate dynamic programming, Transportation Science, vol.50, issue.2, pp.366-389, 2016.

M. Chen, Y. Hsiao, R. H. Reddy, and M. K. Tiwari, The self-learning particle swarm optimization approach for routing pickup and delivery of multiple products with material handling in multiple cross-docks, Transportation Research Part E: Logistics and Transportation Review, vol.91, pp.208-226, 2016.

B. Coltin and M. Veloso, Online pickup and delivery planning with transfers for mobile robots, 2014 IEEE International Conference on Robotics and Automation (ICRA), pp.5786-5791, 2014.

J. Cordeau and G. Laporte, A tabu search heuristic for the static multi-vehicle dial-a-ride problem, Transportation Research Part B: Methodological, vol.37, issue.6, pp.579-594, 2003.

Y. Duan, X. Chen, R. Houthooft, J. Schulman, A. et al., Benchmarking deep reinforcement learning for continuous control, Proceedings of The 33rd International Conference on Machine Learning, vol.48, pp.20-22, 2016.

S. Ichoua, M. Gendreau, and J. Potvin, Exploiting knowledge about future demands for real-time vehicle dispatching, Transportation Science, vol.40, issue.2, pp.211-225, 2006.
DOI : 10.1287/trsc.1050.0114
URL : http://www.iro.umontreal.ca/%7Epotvin/soumia3_ts.pdf

D. Krajzewicz, J. Erdmann, M. Behrisch, and L. Bieker, Recent development and applications of sumo-simulation of urban mobility, International Journal On Advances in Systems and Measurements, vol.5, issue.3&4, pp.128-138, 2012.

S. Krauß, Microscopic modeling of traffic flow: Investigation of collision free vehicle dynamics, 1998.

S. Mahadevan, N. Marchalleck, T. K. Das, G. , and A. , Self-improving factory simulation using continuous-time average-reward reinforcement learning, Proceedings of the 14th International Conference on Machine Learning, pp.202-210, 1997.

M. S. Maxwell, M. Restrepo, S. G. Henderson, and H. Topaloglu, Approximate dynamic programming for ambulance redeployment, INFORMS Journal on Computing, vol.22, issue.2, pp.266-281, 2010.
DOI : 10.1287/ijoc.1090.0345
URL : http://legacy.orie.cornell.edu/huseyin/publications/ems_ndp.pdf

C. Novoa and R. Storer, An approximate dynamic programming approach for the vehicle routing problem with stochastic demands, European Journal of Operational Research, vol.196, issue.2, pp.509-515, 2009.

S. N. Parragh, K. F. Doerner, and R. F. Hartl, Variable neighborhood search for the dial-a-ride problem, Computers and Operations Research, vol.37, issue.6, pp.1129-1138, 2010.
DOI : 10.1016/j.cor.2009.10.003

D. Sáez, C. E. Cortés, and A. Núñez, Hybrid adaptive predictive control for the multi-vehicle dynamic pick-up and delivery problem based on genetic algorithms and fuzzy clustering, Computers and Operations Research, vol.35, issue.11, pp.3412-3438, 2008.

M. Schilde, K. Doerner, and R. Hartl, Integrating stochastic time-dependent travel speed in solution methods for the dynamic dial-a-ride problem, European Journal of Operational Research, vol.238, issue.1, pp.18-30, 2014.

H. P. Simão, J. Day, A. P. George, T. Gifford, J. Nienow et al., An approximate dynamic programming algorithm for large-scale fleet management: A case application, Transportation Science, vol.43, issue.2, pp.178-197, 2009.

M. M. Solomon, Algorithms for the vehicle routing and scheduling problems with time window constraints, Operations Research, vol.35, issue.2, pp.254-265, 1987.
DOI : 10.1287/opre.35.2.254
URL : http://www.banaan.org/~vinnoo/pdf1.pdf

R. Sutton and A. Barto, Reinforcement Learning: An Introduction. A Bradford book, 1998.

R. S. Sutton, D. Precup, and S. Singh, Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning, Artif. Intell, vol.112, issue.1-2, pp.181-211, 1999.
DOI : 10.1016/s0004-3702(99)00052-1
URL : https://doi.org/10.1016/s0004-3702(99)00052-1

P. Toth and D. Vigo, The Vehicle Routing Problem, Society for Industrial and Applied Mathematics, 2002.
URL : https://hal.archives-ouvertes.fr/hal-01223571

C. Wu, A. Kreidieh, K. Parvate, E. Vinitsky, and A. M. Bayen, Flow: Architecture and benchmarking for reinforcement learning in traffic control, CoRR, 2017.