, Deep reinforcement learning that matters, 2017.
, OpenAI Gym, 2016.
Mat Harvey's self-driving car project, pp.28-33 ,
Learning to act by predicting the future, 2016. ,
Human-level control through deep reinforcement learning, Nature, vol.518, p.529, 2015. ,
The arcade learning environment: An evaluation platform for general agents, Journal of Artificial Intelligence Research, vol.47, pp.253-279, 2013. ,
, Deepmind lab, 2016.
Vizdoom: A doom-based ai research platform for visual reinforcement learning, Computational Intelligence and Games (CIG), 2016 IEEE Conference, pp.1-8, 2016. ,
The Malmo Platform for Artificial Intelligence Experimentation, 2016. ,
Mujoco: A physics engine for model-based control, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp.5026-5033, 2012. ,
, Asynchronous methods for deep reinforcement learning. International Conference on Machine Learning, pp.1928-1937, 2016.
Learning to Poke by Poking: Experiential Learning of Intuitive Physics, Jitendra Malik & Sergey Levine, 2016. ,
, World Models, 2018.
Unsupervised Predictive Memory in a GoalDirected Agent, 2018. ,
The free-energy principle: a rough guide to the brain?, Trends in cognitive sciences, vol.13, pp.293-301, 2009. ,
Adam: A method for stochastic optimization, 2014. ,
Successor features for transfer in reinforcement learning, Advances in neural information processing systems, pp.4055-4065, 2017. ,
Reinforcement learning: An introduction, 1998. ,
Building machines that learn and think like people, Behavioral and Brain Sciences, p.40, 2017. ,
Learning from Delayed Rewards, 1989. ,
Self-improving reactive agents based on reinforcement learning, planning and teaching, Machine learning, vol.8, pp.293-321, 1992. ,
Probability and statistical inference, 2009. ,
Distral: Robust multitask reinforcement learning, Advances in Neural Information Processing Systems, pp.4499-4509, 2017. ,
, APPENDIX We present implementation details for each of the three RL baselines that we experiment with (see Sec. IV of main paper)
, A. Deep Q-Network ? Implementation: Keras-rl
, ? Normalization of inputs ? Adam: learning rate = 0.001, ? 1 = 0.9, ? 2 = 0.999 ? Policy: Boltzmann policy (softmax) with temperature 1. ? 1500 timesteps warmup ? Soft updates
, 3))-Convolution 1-D (filters: 32, kernel size: 8, 1)-Convolution 1-D (48,4,1)-Convolution 1-D (64,3,1)-Max, ? Replay buffer size: 500000 ? Architecture: Input (shape =, vol.64
, Advantage Actor-Critic ? Implementation
, ? 5 actor-learner threads, all methods performed updates after every 20 actions (t max = 20 and I update = 20)
, No action repeat: execute action on every frame (action repeat = 1)
, ? Architecture: Convolution 1-D (filters: 32, kernel size: 8, 1)-Convolution 1-D
Direct-Future-Prediction-Keras ? Adam: learning rate = 0.00001, ? 1 = 0.9, ? 2 = 0.999 ? Measurements used: score, number of fruits picked up, Future Prediction ? Implementation ,
, ? Normalization of inputs and measurements ? 1000 timesteps warmup ? Training interval: 3 timesteps ? Policy, p.300000
? Architecture: We only modify the convolutional part with: Convolution 1-D (filters: 32, kernel size: 8, 1)Convolution 1-D (48,4,1)-Convolution 1-D (64,3,1)Max Pooling 1-D, the rest is unchanged. Deep Q-Network agent after learning in the experiments. The video can be found here, 20000. ,