M. Babaeizadeh, C. Finn, D. Erhan, R. Campbell, and S. Levine, Stochastic variational video prediction, International Conference on Learning Representations, 2018.

J. Bayer and C. Osendorfer, Learning stochastic recurrent networks, 2014.

J. Walker, A. Gupta, and M. Hebert, Dense optical flow prediction from a static image, The IEEE International Conference on Computer Vision (ICCV), pp.2443-2451, 2015.

J. Walker, C. Doersch, A. Gupta, and M. Hebert, An uncertain future: Forecasting from static images using variational autoencoders, The European Conference on Computer Vision (ECCV), pp.835-851, 2016.

T. Wang, M. Liu, J. Zhu, G. Liu, A. Tao et al., Video-to-video synthesis, Advances in Neural Information Processing Systems, vol.31, pp.1144-1156, 2018.

D. Weissenborn, O. Täckström, and J. Uszkoreit, Scaling autoregressive video models, International Conference on Learning Representations, 2020.

N. Wichers, R. Villegas, D. Erhan, and H. Lee, Hierarchical long-term video prediction without supervision, Proceedings of the 35th International Conference on Machine Learning, vol.80, pp.6038-6046, 2018.

Y. Wu, R. Gao, J. Park, C. , and Q. , Future video synthesis with object motion prediction, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.5539-5548, 2020.

J. Xu, B. Ni, Z. Li, S. Cheng, Y. et al., Structure preserving video prediction, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1460-1469, 2018.

J. Xu, B. Ni, Y. , X. Bengio, S. Wallach et al., Video prediction via selective sampling, Advances in Neural Information Processing Systems, vol.31, pp.1705-1715, 2018.

T. Xue, J. Wu, K. L. Bouman, and W. T. Freeman, Visual dynamics: Probabilistic future frame synthesis via cross convolutional networks, Advances in Neural Information Processing Systems, vol.29, pp.91-99, 2016.

L. Yingzhen and S. Mandt, Disentangled sequential autoencoder, Proceedings of the 35th International Conference on Machine Learning, vol.80, pp.5670-5679, 2018.

C. Y?ld?z, M. Heinonen, and H. Lahdesmaki, ODE 2 VAE: Deep generative second order odes with Bayesian neural networks, Advances in Neural Information Processing Systems, vol.32, pp.13412-13421, 2019.

M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Póczos, R. Salakhutdinov et al., Deep sets, Advances in Neural Information Processing Systems, vol.30, pp.3391-3401, 2017.

R. Zhang, P. Isola, A. A. Efros, E. Shechtman, W. et al., The unreasonable effectiveness of deep features as a perceptual metric, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.586-595, 2018.