J. Références, S. Achiam, ;. Sastry, and . Achiam, The arcade learning environment : An evaluation platform for general agents (extended abstract), Proceedings of the 3rd International Conference on Development and Learning, pp.1471-1479, 2004.

;. Berlyne, ;. Daniel-e-berlyne, and . Blundell, Structure and direction in thinking, Weight uncertainty in neural networks, 1965.

R. Burda, Maximization of potential information flow as a universal utility for collective behaviour, International Conference on Learning Representations, vol.3, pp.207-213, 2002.

. Cesa-bianchi, Learning navigation behaviors end-to-end with autorl, Advances in Neural Information Processing Systems, vol.4, pp.2007-2014, 2012.

K. Abril and . Deisenroth, A unified strategy for implementing curiosity and empowerment driven reinforcement learning, Foundations and Trends R in Robotics, vol.2, issue.1-2, pp.1-142, 2013.

. Dilokthanakul, Lior Fox, Leshem Choshen, and Yonatan Loewenstein. DORA the explorer : Directed outreaching reinforcement action-selection, Carlos Florensa, Jonas Degrave, Nicolas Heess, Jost Tobias Springenberg, and Martin Riedmiller. Self-supervised learning of image embedding for continuous control, vol.15, pp.1514-1523, 2014.

. François-lavet, Curiosity driven reinforcement learning for motion planning on humanoids, Foundations and Trends R in Machine Learning, vol.11, issue.3-4, p.25, 2014.

, Bayesian reinforcement learning : A survey. Foundations and Trends R in Machine Learning, Filip De Turck, and Pieter Abbeel. Vime : Variational information maximizing exploration, vol.2, pp.1109-1117, 1999.

. Hughes, , 2018.

A. García-castañeda, I. Dunning, T. Zhu, K. R. Mckee, and R. Koster, Information thermodynamics on causal networks and its application to biochemical signal transduction, Inequity aversion resolves intertemporal social dilemmas, 2016.

L. Itti, F. Pierre, and . Baldi, Bayesian surprise attracts human attention, Advances in neural information processing systems, pp.547-554, 2006.

[. Jaques, Intrinsic social motivation via causal influence in multi-agent rl, 2018.

, Unsupervised realtime control through variational empowerment, 2017.

S. Kearns, S. Kearns, ;. Singh, and . Kempka, Vizdoom : A doom-based ai research platform for visual reinforcement learning, 2016 IEEE Conference on Computational Intelligence and Games (CIG), vol.49, pp.1-8, 2002.

W. Kingma, P. Diederik, M. Kingma, and . Welling, Auto-encoding variational bayes, 2013.

[. Kirkpatrick, Overcoming catastrophic forgetting in neural networks, Proceedings of the national academy of sciences, vol.114, pp.3521-3526, 2017.

[. Klyubin, Empowerment : A universal agent-centric measure of control, The 2005 IEEE Congress on, vol.1, pp.128-135, 2005.

[. Kulkarni, Hierarchical deep reinforcement learning : Integrating temporal abstraction and intrinsic motivation, Advances in neural information processing systems, pp.3675-3683, 2016.

[. Kulkarni, Option discovery in hierarchical reinforcement learning using spatio-temporal clustering, Deep successor reinforcement learning, pp.329-336, 2008.

[. Leibo, , 2017.

, International Foundation for Autonomous Agents and Multiagent Systems, Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, pp.464-473, 2017.

, Hierarchical reinforcement learning with hindsight, International Conference on Learning Representations, 2019.

[. Lillicrap, Learning and exploration in action-perception loops, Continuous control with deep reinforcement learning, vol.7, p.37, 2013.

, Exploration in model-based reinforcement learning by empirically estimating learning progress, Proceedings of the 34th International Conference on Machine Learning, vol.45, pp.2295-2304, 2012.

[. Machado, Shakir Mohamed and Danilo Jimenez Rezende. Variational information maximisation for intrinsically motivated reinforcement learning, Changjae Oh and Andrea Cavallaro. Learning action representations for self-supervised visual exploration, vol.15, pp.278-287, 1989.

, Actionconditional video prediction using deep networks in atari games, Proceedings of the 8th International Conference on Epigenetic Robotics : Modeling Cognitive Development in Robotic Systems, vol.1, pp.492-502, 2008.

. Pathak, Unsupervised methods for subgoal discovery during intrinsic motivation in model-free hierarchical reinforcement learning, Proceedings of the 35th International Conference on Machine Learning, ICML 2018, vol.2017, pp.1312-1320, 1952.

J. Schmidhuber, ;. Schwenker, G. Palm, and ;. Sequeira, Driven by compression progress : A simple principle explains essential aspects of subjective beauty, novelty, surprise, interestingness, attention, curiosity, creativity, art, science, music, jokes, 2011 IEEE International Conference on Development and Learning (ICDL), vol.2, pp.70-82, 1991.

;. B. Skinner and . Skinner-;-stadie, Incentivizing exploration in reinforcement learning with deep predictive models, New York : Appleton, 1938.

C. Stanton, J. Stanton, and . Clune, Still and Precup, 2012] Susanne Still and Doina Precup. An information-theoretic approach to curiositydriven reinforcement learning, Theory in Biosciences, vol.131, issue.3, pp.139-148, 2012.

, Between mdps and semi-mdps : A framework for temporal abstraction in reinforcement learning, Reward shaping with recurrent neural networks for speeding up on-line policy learning in spoken dialogue systems, vol.112, pp.3540-3549, 1998.

W. , Unsupervised control through non-parametric discriminative rewards, 2015 IEEE congress on evolutionary computation (CEC), vol.66, pp.715-770, 1959.

, Vision-based robot navigation through combining unsupervised learning and hierarchical reinforcement learning, Scheduled intrinsic drive : A hierarchical take on intrinsically motivated exploration, vol.26, p.1576, 2015.