P. Bideau and E. G. , Learned-Miller. It's moving! A probabilistic model for causal motion segmentation in moving camera videos, ECCV, 2016.

T. Brox and J. Malik, Object Segmentation by Long Term Analysis of Point Trajectories, ECCV, 2010.
DOI : 10.1007/978-3-642-15555-0_21

T. Brox and J. Malik, Large Displacement Optical Flow: Descriptor Matching in Variational Motion Estimation, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.33, issue.3, 2011.
DOI : 10.1109/TPAMI.2010.143

S. Caelles, K. Maninis, J. Pont-tuset, L. Leal-taixé, D. Cremers et al., One-shot video segmentation, CVPR, 2017.

L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, Semantic image segmentation with deep convolutional nets and fully connected CRFs, ICLR, 2015.
DOI : 10.1109/tpami.2017.2699184
URL : http://arxiv.org/abs/1606.00915

K. Cho, B. Van-merrienboer, C. ¸. Gülçehre, F. Bougares, H. Schwenk et al., Learning Phrase Representations using RNN Encoder???Decoder for Statistical Machine Translation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014.
DOI : 10.3115/v1/D14-1179
URL : https://hal.archives-ouvertes.fr/hal-01433235

J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan et al., Long-Term Recurrent Convolutional Networks for Visual Recognition and Description, CVPR, 2015.
DOI : 10.1109/TPAMI.2016.2599174
URL : http://arxiv.org/abs/1411.4389

A. Faktor and M. Irani, Video Object Segmentation by Non-Local Consensus voting, Proceedings of the British Machine Vision Conference 2014, 2014.
DOI : 10.5244/C.28.21

C. Finn, I. Goodfellow, and S. Levine, Unsupervised learning for physical interaction through video prediction, NIPS, 2016.

K. Fragkiadaki, G. Zhang, and J. Shi, Video segmentation by tracing discontinuities in a trajectory embedding, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.
DOI : 10.1109/CVPR.2012.6247883

X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, AISTATS, 2010.

A. Graves, Generating sequences with recurrent neural networks . arXiv preprint, 2013.

A. Graves, N. Jaitly, and A. Mohamed, Hybrid speech recognition with Deep Bidirectional LSTM, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013.
DOI : 10.1109/ASRU.2013.6707742

A. Graves, A. Mohamed, and G. Hinton, Speech recognition with deep recurrent neural networks, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013.
DOI : 10.1109/ICASSP.2013.6638947
URL : http://arxiv.org/abs/1303.5778

A. Graves and J. Schmidhuber, Framewise phoneme classification with bidirectional LSTM and other neural network architectures, Neural Networks, vol.18, issue.5-6, pp.602-610, 2005.
DOI : 10.1016/j.neunet.2005.06.042
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.331.5800

S. Hochreiter, S. Hochreiter, and J. Schmidhuber, The vanishing gradient problem during learning recurrent neural nets and problem solutions Long short-term memory, Int. J. Uncertain . Fuzziness Knowl.-Based Syst. Neural computation, vol.9, issue.8, pp.1735-1780, 1997.
DOI : 10.1142/s0218488598000094

J. J. Hopfield, Neural networks and physical systems with emergent collective computational abilities, Proc. National Academy of Sciences, pp.2554-2558, 1982.

M. Keuper, B. Andres, and T. Brox, Motion Trajectory Segmentation via Minimum Cost Multicuts, 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
DOI : 10.1109/ICCV.2015.374

A. Khoreva, F. Perazzi, R. Benenson, B. Schiele, and A. Sorkine-hornung, Learning video object segmentation from static images, CVPR, 2017.

P. Krähenbühl and V. Koltun, Efficient inference in fully connected CRFs with Gaussian edge potentials, NIPS, 2011.

Y. J. Lee, J. Kim, and K. Grauman, Key-segments for video object segmentation, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126471
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.269.2727

N. Mayer, E. Ilg, P. Häusser, P. Fischer, D. Cremers et al., A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.438

T. Mikolov, M. Karafiát, L. Burget, and J. , Cernock`Cernock`y, and S. Khudanpur . Recurrent neural network based language model, Interspeech, 2010.

J. Y. Ng, M. J. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga et al., Beyond short snippets: Deep networks for video classification, CVPR, 2015.

P. Ochs and T. Brox, Object segmentation in video: A hierarchical variational approach for turning point trajectories into dense regions, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126418

P. Ochs, J. Malik, and T. Brox, Segmentation of Moving Objects by Long Term Video Analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.36, issue.6, pp.1187-1200, 2014.
DOI : 10.1109/TPAMI.2013.242

A. Papazoglou and V. Ferrari, Fast Object Segmentation in Unconstrained Video, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.223

V. Patraucean, A. Handa, and R. Cipolla, Spatio-temporal video autoencoder with differentiable memory, ICLR Workshop track, 2016.

F. Perazzi, J. Pont-tuset, B. Mcwilliams, L. V. Gool, M. Gross et al., Sorkine-Hornung. A benchmark dataset and evaluation methodology for video object segmentation, CVPR, 2016.

P. O. Pinheiro, T. Lin, R. Collobert, and P. Dollár, Learning to Refine Object Segments, p.2016
DOI : 10.5244/C.30.15
URL : http://arxiv.org/abs/1603.08695

S. Ren, K. He, R. Girshick, and J. Sun, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, NIPS, 2015.
DOI : 10.1109/TPAMI.2016.2577031
URL : http://arxiv.org/abs/1506.01497

D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning representations by back-propagating errors, Nature, vol.85, issue.6088, 1986.
DOI : 10.1038/323533a0

X. Shi, Z. Chen, H. Wang, D. Yeung, W. Wong et al., Convolutional LSTM network: A machine learning approach for precipitation nowcasting, NIPS, 2015.

K. Simonyan and A. Zisserman, Two-stream convolutional networks for action recognition in videos, NIPS, 2014.

N. Srivastava, E. Mansimov, and R. Salakhutdinov, Unsupervised learning of video representations using LSTMs, ICML, 2015.

B. Taylor, V. Karasev, and S. Soatto, Causal video object segmentation from persistence of occlusions, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2015.7299055

P. Tokmakov, K. Alahari, and C. Schmid, Learning motion patterns in videos, CVPR, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01427480

P. Tokmakov, K. Alahari, and C. Schmid, Learning video object segmentation with visual memory, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01511145

W. Wang, J. Shen, and F. Porikli, Saliency-aware geodesic video object segmentation, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2015.7298961

P. J. Werbos, Backpropagation through time: what it does and how to do it, Proc. IEEE, pp.1550-1560, 1990.
DOI : 10.1109/5.58337