E. H. Adelson, On seeing stuff: The perception of materials by humans and machines, Proc. SPIE, 2001.

V. Badrinarayanan, F. Galasso, and R. Cipolla, Label propagation in video sequences, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, p.CVPR, 2010.
DOI : 10.1109/CVPR.2010.5540054

URL : http://mi.eng.cam.ac.uk/%7Ecipolla/publications/inproceedings/2010-CVPR-label-propagation.pdf

N. Ballas, L. Yao, C. Pal, and A. Courville, Delving deeper into convolutional networks for learning video representations, p.ICLR, 2016.

P. Bideau and E. G. Learned-miller, It???s Moving! A Probabilistic Model for Causal Motion Segmentation in Moving Camera Videos, p.ECCV, 2016.
DOI : 10.1109/83.334981

W. Brendel and S. Todorovic, Video object segmentation by tracking regions, 2009 IEEE 12th International Conference on Computer Vision, p.ICCV, 2009.
DOI : 10.1109/ICCV.2009.5459242

URL : http://web.engr.oregonstate.edu/~sinisa/research/publications/iccv09_video.pdf

T. Brox and J. Malik, Object Segmentation by Long Term Analysis of Point Trajectories, p.ECCV, 2010.
DOI : 10.1007/978-3-642-15555-0_21

T. Brox and J. Malik, Large Displacement Optical Flow: Descriptor Matching in Variational Motion Estimation, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.33, issue.3, pp.500-513, 2011.
DOI : 10.1109/TPAMI.2010.143

S. Caelles, K. K. Pont-tuset, L. Leal-taixé, D. Cremers, and L. Van-gool, One-shot video segmentation, p.CVPR, 2017.
DOI : 10.1109/cvpr.2017.565

L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, Semantic image segmentation with deep convolutional nets and fully connected CRFs, p.ICLR, 2015.
DOI : 10.1109/tpami.2017.2699184

L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.
DOI : 10.1109/TPAMI.2017.2699184

K. Cho, B. Van-merrienboer, C. ¸. Gülçehre, F. Bougares, H. Schwenk et al., Learning Phrase Representations using RNN Encoder???Decoder for Statistical Machine Translation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), p.EMNLP, 2014.
DOI : 10.3115/v1/D14-1179

URL : https://hal.archives-ouvertes.fr/hal-01433235

J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan et al., Long-term recurrent convolutional networks for visual recognition and description, p.CVPR, 2015.
DOI : 10.1109/tpami.2016.2599174

URL : http://arxiv.org/abs/1411.4389

A. Dosovitskiy, P. Fischer, E. Ilg, P. Häusser, C. Haz?rbas et al., FlowNet: Learning Optical Flow with Convolutional Networks, 2015 IEEE International Conference on Computer Vision (ICCV), p.ICCV, 2015.
DOI : 10.1109/ICCV.2015.316

A. Irani and M. , Video segmentation by non-local consensus voting, p.BMVC, 2014.

C. Finn, I. Goodfellow, and S. Levine, Unsupervised learning for physical interaction through video prediction, p.NIPS, 2016.

K. Fragkiadaki, G. Zhang, and J. Shi, Video segmentation by tracing discontinuities in a trajectory embedding, 2012 IEEE Conference on Computer Vision and Pattern Recognition, p.CVPR, 2012.
DOI : 10.1109/CVPR.2012.6247883

X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, p.AISTATS, 2010.

A. Graves, Generating sequences with recurrent neural networks. arXiv preprint, 2013.

A. Graves, N. Jaitly, and A. Mohamed, Hybrid speech recognition with Deep Bidirectional LSTM, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013.
DOI : 10.1109/ASRU.2013.6707742

A. Graves, A. Mohamed, and G. Hinton, Speech recognition with deep recurrent neural networks, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013.
DOI : 10.1109/ICASSP.2013.6638947

A. Graves and J. Schmidhuber, Framewise phoneme classification with bidirectional LSTM and other neural network architectures, Neural Networks, vol.18, issue.5-6, pp.602-610, 2005.
DOI : 10.1016/j.neunet.2005.06.042

M. Grundmann, V. Kwatra, M. Han, and I. Essa, Efficient hierarchical graph-based video segmentation, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, p.CVPR, 2010.
DOI : 10.1109/CVPR.2010.5539893

S. Hochreiter, The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol.06, issue.02, pp.107-116, 1998.
DOI : 10.1142/S0218488598000094

S. Hochreiter and J. Schmidhuber, Long Short-Term Memory, Neural Computation, vol.4, issue.8, pp.1735-1780, 1997.
DOI : 10.1016/0893-6080(88)90007-X

J. J. Hopfield, Neural networks and physical systems with emergent collective computational abilities., Proceedings of the National Academy of Sciences, vol.79, issue.8, pp.2554-2558, 1982.
DOI : 10.1073/pnas.79.8.2554

F. Huguet and F. Devernay, A Variational Method for Scene Flow Estimation from Stereo Sequences, 2007 IEEE 11th International Conference on Computer Vision, p.ICCV, 2007.
DOI : 10.1109/ICCV.2007.4409000

URL : https://hal.archives-ouvertes.fr/inria-00262139

E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy et al., FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p.CVPR, 2017.
DOI : 10.1109/CVPR.2017.179

S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, p.ICML, 2015.

S. D. Jain, B. Xiong, and K. Grauman, FusionSeg: Learning to Combine Motion and Appearance for Fully Automatic Segmentation of Generic Objects in Videos, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p.CVPR, 2017.
DOI : 10.1109/CVPR.2017.228

M. Keuper, B. Andres, and T. Brox, Motion Trajectory Segmentation via Minimum Cost Multicuts, 2015 IEEE International Conference on Computer Vision (ICCV), p.ICCV, 2015.
DOI : 10.1109/ICCV.2015.374

A. Khoreva, F. Galasso, M. Hein, and B. Schiele, Classifier based graph construction for video segmentation, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p.CVPR, 2015.
DOI : 10.1109/CVPR.2015.7298697

A. Khoreva, F. Perazzi, R. Benenson, B. Schiele, and A. Sorkine-hornung, Learning video object segmentation from static images, p.CVPR, 2017.

Y. J. Koh and C. S. Kim, Primary Object Segmentation in Videos Based on Region Augmentation and Reduction, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p.CVPR, 2017.
DOI : 10.1109/CVPR.2017.784

P. Krähenbühl and V. Koltun, Efficient inference in fully connected CRFs with Gaussian edge potentials, p.NIPS, 2011.

Y. J. Lee, J. Kim, and K. Grauman, Key-segments for video object segmentation, 2011 International Conference on Computer Vision, p.ICCV, 2011.
DOI : 10.1109/ICCV.2011.6126471

J. Lezama, K. Alahari, J. Sivic, and I. Laptev, Track to the future: Spatio-temporal video segmentation with long-range motion cues, CVPR 2011, p.CVPR, 2011.
DOI : 10.1109/CVPR.2011.6044588

URL : https://hal.archives-ouvertes.fr/hal-00817961

F. Li, T. Kim, A. Humayun, D. Tsai, and J. M. Rehg, Video Segmentation by Tracking Many Figure-Ground Segments, 2013 IEEE International Conference on Computer Vision, p.ICCV, 2013.
DOI : 10.1109/ICCV.2013.273

T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona et al., Microsoft COCO: Common Objects in Context, p.ECCV, 2014.
DOI : 10.1007/978-3-319-10602-1_48

J. Long, E. Shelhamer, and T. Darrell, Fully convolutional networks for semantic segmentation, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p.CVPR, 2015.
DOI : 10.1109/CVPR.2015.7298965

N. Mayer, E. Ilg, P. Häusser, P. Fischer, D. Cremers et al., A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p.CVPR, 2016.
DOI : 10.1109/CVPR.2016.438

T. Mikolov, M. Karafiát, L. Burget, J. Cernock´ycernock´y, and S. Khudanpur, Recurrent neural network based language model, p.Interspeech, 2010.

M. Narayana, A. R. Hanson, and E. G. Learned-miller, Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations, 2013 IEEE International Conference on Computer Vision, p.ICCV, 2013.
DOI : 10.1109/ICCV.2013.199

J. Y. Ng, M. J. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga et al., Beyond short snippets: Deep networks for video classification, p.CVPR, 2015.

P. Ochs and T. Brox, Higher order motion models and spectral clustering, 2012 IEEE Conference on Computer Vision and Pattern Recognition, p.CVPR, 2012.
DOI : 10.1109/CVPR.2012.6247728

P. Ochs, J. Malik, and T. Brox, Segmentation of Moving Objects by Long Term Video Analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.36, issue.6, pp.1187-1200, 2014.
DOI : 10.1109/TPAMI.2013.242

A. Papazoglou and V. Ferrari, Fast Object Segmentation in Unconstrained Video, 2013 IEEE International Conference on Computer Vision, p.ICCV, 2013.
DOI : 10.1109/ICCV.2013.223

R. Pascanu, T. Mikolov, and Y. Bengio, On the difficulty of training recurrent neural networks, p.ICML, 2013.

V. Patraucean, A. Handa, and R. Cipolla, Spatio-temporal video autoencoder with differentiable memory, 2016.

F. Perazzi, J. Pont-tuset, B. Mcwilliams, L. Van-gool, M. Gross et al., A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p.CVPR, 2016.
DOI : 10.1109/CVPR.2016.85

P. O. Pinheiro, T. Y. Lin, R. Collobert, and P. Dollár, Learning to Refine Object Segments, p.ECCV, 2016.
DOI : 10.5244/C.30.15

S. Ren, K. He, R. Girshick, and J. Sun, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.39, issue.6, p.NIPS, 2015.
DOI : 10.1109/TPAMI.2016.2577031

J. Revaud, P. Weinzaepfel, Z. Harchaoui, and C. Schmid, EpicFlow: Edge-preserving interpolation of correspondences for optical flow, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p.CVPR, 2015.
DOI : 10.1109/CVPR.2015.7298720

URL : https://hal.archives-ouvertes.fr/hal-01097477

O. Ronneberger, P. Fischer, and T. Brox, U-Net: Convolutional Networks for Biomedical Image Segmentation, p.MICCAI, 2015.
DOI : 10.1007/978-3-319-24574-4_28

D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning representations by back-propagating errors, Nature, vol.85, issue.6088, pp.533-536, 1986.
DOI : 10.1038/323533a0

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh et al., ImageNet Large Scale Visual Recognition Challenge, International Journal of Computer Vision, vol.1010, issue.1, pp.211-252, 2015.
DOI : 10.1007/978-3-642-15555-0_11

X. Shi, Z. Chen, H. Wang, D. Y. Yeung, W. Wong et al., Convolutional LSTM network: A machine learning approach for precipitation nowcasting, p.NIPS, 2015.

K. Simonyan and A. Zisserman, Two-stream convolutional networks for action recognition in videos, p.NIPS, 2014.

N. Srivastava, E. Mansimov, and R. Salakhutdinov, Unsupervised learning of video representations using LSTMs, p.ICML, 2015.

N. Sundaram, T. Brox, and K. Keutzer, Dense Point Trajectories by GPU-Accelerated Large Displacement Optical Flow, p.ECCV, 2010.
DOI : 10.1007/978-3-642-15549-9_32

B. Taylor, V. Karasev, and S. Soatto, Causal video object segmentation from persistence of occlusions, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p.CVPR, 2015.
DOI : 10.1109/CVPR.2015.7299055

P. Tokmakov, K. Alahari, and C. Schmid, Learning Motion Patterns in Videos, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p.CVPR, 2017.
DOI : 10.1109/CVPR.2017.64

URL : https://hal.archives-ouvertes.fr/hal-01427480

P. Tokmakov, K. Alahari, and C. Schmid, Learning video object segmentation with visual memory, p.ICCV, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01511145

P. H. Torr, Geometric motion segmentation and model selection, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol.356, issue.1740, pp.1321-1340, 1740.
DOI : 10.1098/rsta.1998.0224

C. Vogel, K. Schindler, and S. Roth, 3D Scene Flow Estimation with a Piecewise Rigid Scene Model, International Journal of Computer Vision, vol.3, issue.2, pp.1-28, 2015.
DOI : 10.1007/BFb0028345

W. Wang, J. Shen, and F. Porikli, Saliency-aware geodesic video object segmentation, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p.CVPR, 2015.
DOI : 10.1109/CVPR.2015.7298961

A. Wedel, T. Brox, T. Vaudrey, C. Rabe, U. Franke et al., Stereoscopic Scene Flow Computation for 3D Motion Understanding, International Journal of Computer Vision, vol.27, issue.3, pp.29-51, 2011.
DOI : 10.1007/978-3-642-03641-5_16

P. J. Werbos, Backpropagation through time: what it does and how to do it, Proc. IEEE, pp.1550-1560, 1990.
DOI : 10.1109/5.58337

C. Xu and J. J. Corso, LIBSVX: A Supervoxel Library and Benchmark for Early Video Processing, International Journal of Computer Vision, vol.13, issue.6, pp.272-290, 2016.
DOI : 10.1109/ICCV.2011.6126274

D. Zhang, O. Javed, and M. Shah, Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions, 2013 IEEE Conference on Computer Vision and Pattern Recognition, p.CVPR, 2013.
DOI : 10.1109/CVPR.2013.87