D. P. Bertsekas, Dynamic Programming and Optimal Control, Athena Scientific, vol.2, 2012.

D. P. Bertsekas and S. E. Shreve, Stochastic Optimal Control: The Discrete Time Case, 1978.

L. Bu?oniu, D. Ernst, B. D. Schutter, and R. Babu?ka, Approximate dynamic programming with a fuzzy parameterization, Automatica, vol.46, issue.5, pp.804-814, 2010.

N. Chatzipanagiotis, Y. Liu, A. Petropulu, and M. M. Zavlanos, Controlling groups of mobile beamformers, Proceedings 51st IEEE Conference on Decision and Control (CDC), pp.1984-1989, 2012.

J. Fink, A. Ribeiro, and V. Kumar, Robust control for mobility and wireless communication in cyber-physical systems with application to robot teams, Proceedings of the IEEE, vol.100, issue.1, pp.164-178, 2012.

R. Gangula, P. Kerret, O. Esrafilian, and D. Gesbert, Trajectory optimization for mobile access point, 51st Asilomar Conference on Signals, Systems, and Computers, pp.1412-1416, 2017.

D. B. Licea, V. S. Varma, S. Lasaulce, J. Daafouz, and M. Ghogho, Trajectory planning for energy-efficient vehicles with communications constraints, Proceedings 2016 International Conference on Wireless Networks and Mobile Communications (WINCOM16), pp.264-270, 2016.

D. B. Licea, V. S. Varma, S. Lasaulce, J. Daafouz, M. Ghogho et al., Robust trajectory planning for robotic communications under fading channels, Ubiquitous Networking: Third International Symposium, vol.10542, p.450, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01745270

L. Lin, Self-improving reactive agents based on reinforcement learning, planning and teaching, Machine Learning, vol.8, issue.3-4, pp.293-321, 1992.

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness et al., Human-level control through deep reinforcement learning, Nature, vol.518, pp.529-533, 2015.

A. W. Moore and C. G. Atkeson, Prioritized sweeping: Reinforcement learning with less data and less time, Machine Learning, vol.13, pp.103-130, 1993.

R. Olfati-saber, J. A. Fax, and R. M. Murray, Consensus and cooperation in networked multi-agent systems, Proceedings of the IEEE, vol.95, issue.1, pp.215-233, 2007.

C. C. Ooi and C. Schindelhauer, Minimal energy path planning for wireless robots, Mobile Networks and Applications, vol.14, issue.3, pp.309-321, 2009.

P. Pietraski, G. Charlton, R. Yang, and C. Wang, Enhanced cell-edge performance with transmit power-shaping and multipoint, multiflow techniques, ZTE Communications, issue.4, 2011.

M. N. Rooker and A. Birk, Multi-robot exploration under the constraints of wireless networking, Control Engineering Practice, vol.15, issue.4, pp.435-445, 2007.

R. S. Sutton, Integrated architectures for learning, planning, and reacting based on approximating dynamic programming, Proceedings 7th International Conference on Machine Learning (ICML-90), pp.216-224, 1990.

R. S. Sutton and A. G. Barto, ser. Adaptive Computation and Machine Learning. A, 2018.

C. J. Watkins and P. Dayan, Q-learning, Machine Learning, vol.8, pp.279-292, 1992.

Y. Yan and Y. Mostofi, Co-optimization of communication and motion planning of a robotic operation under resource constraints and in fading environments, IEEE Transactions on Wireless Communications, vol.12, issue.4, pp.1562-1572, 2013.