, Learning motion patterns in videos

E. H. Adelson, On seeing stuff: The perception of materials by humans and machines, Proc. SPIE, 2001.

V. Badrinarayanan, F. Galasso, and R. Cipolla, Label propagation in video sequences, 2010.

N. Ballas, L. Yao, C. Pal, and A. Courville, Delving deeper into convolutional networks for learning video representations, 2016.

P. Bideau and E. G. Learned-miller, It's moving! A probabilistic model for causal motion segmentation in moving camera videos, 2016.

W. Brendel and S. Todorovic, Video object segmentation by tracking regions, 2009.

T. Brox and J. Malik, Object segmentation by long term analysis of point trajectories, 2010.

T. Brox and J. Malik, Large displacement optical flow: Descriptor matching in variational motion estimation, PAMI, vol.33, issue.3, pp.500-513, 2011.

W. Byeon, T. M. Breuel, F. Raue, and M. Liwicki, Scene labeling with lstm recurrent neural networks. In: CVPR, 2015.

S. Caelles, K. K. Pont-tuset, L. Leal-taixé, D. Cremers, and L. Van-gool, One-shot video segmentation, 2017.

J. Chen, L. Yang, Y. Zhang, M. Alber, and D. Z. Chen, Combining fully convolutional and recurrent neural networks for 3d biomedical image segmentation, 2016.

L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, Semantic image segmentation with deep convolutional nets and fully connected CRFs, 2015.

L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs, 2017.

K. Cho, B. Van-merrienboer, C. ¸. Gülçehre, F. Bougares, H. Schwenk et al., Learning phrase representations using RNN encoder-decoder for statistical machine translation, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01433235

J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan et al., Long-term recurrent convolutional networks for visual recognition and description, 2015.

A. Dosovitskiy, P. Fischer, E. Ilg, P. Häusser, C. Haz?rbas et al., FlowNet: Learning optical flow with convolutional networks. In: ICCV, 2015.

M. Everingham, L. Van-gool, C. K. Williams, J. Winn, and A. Zisserman, The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results

A. Faktor and M. Irani, Video segmentation by non-local consensus voting, 2014.

M. Fayyaz, M. H. Saffar, M. Sabokrou, M. Fathy, R. Klette et al., Stfcn: spatio-temporal fcn for semantic video segmentation, 2016.

C. Finn, I. Goodfellow, and S. Levine, Unsupervised learning for physical interaction through video prediction, 2016.

K. Fragkiadaki, G. Zhang, and J. Shi, Video segmentation by tracing discontinuities in a trajectory embedding, 2012.

X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, 2010.

A. Graves, Generating sequences with recurrent neural networks, 2013.

A. Graves, N. Jaitly, and A. Mohamed, Hybrid speech recognition with deep bidirectional LSTM, Workshop on Automatic Speech Recognition and Understanding, 2013.

A. Graves, A. Mohamed, and G. Hinton, Speech recognition with deep recurrent neural networks. In: ICASSP, 2013.
DOI : 10.1109/icassp.2013.6638947

A. Graves and J. Schmidhuber, Framewise phoneme classification with bidirectional LSTM and other neural network architectures, Neural Networks, vol.18, issue.5, pp.602-610, 2005.
DOI : 10.1016/j.neunet.2005.06.042

M. Grundmann, V. Kwatra, M. Han, and I. Essa, Efficient hierarchical graph based video segmentation, 2010.
DOI : 10.1109/cvpr.2010.5539893
URL : http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/36247.pdf

K. He, X. Zhang, S. Ren, and J. Sun, Identity mappings in deep residual networks, pp.630-645, 2016.
DOI : 10.1007/978-3-319-46493-0_38
URL : http://arxiv.org/pdf/1603.05027

S. Hochreiter, The vanishing gradient problem during learning recurrent neural nets and problem solutions, Int. J. Uncertainty, Fuzziness and Knowledge-Based Systems, vol.6, issue.2, pp.107-116, 1998.

S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural computation, vol.9, issue.8, pp.1735-1780, 1997.

J. J. Hopfield, Neural networks and physical systems with emergent collective computational abilities, Proc. National Academy of Sciences, vol.79, issue.8, pp.2554-2558, 1982.
DOI : 10.1201/9780429500459-2
URL : http://europepmc.org/articles/pmc346238?pdf=render

F. Huguet and F. Devernay, A variational method for scene flow estimation from stereo sequences, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00262139

E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy et al., Flownet 2.0: Evolution of optical flow estimation with deep networks, 2017.

S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, p.ICML, 2015.

S. D. Jain, B. Xiong, and K. Grauman, Fusionseg: Learning to combine motion and appearance for fully automatic segmention of generic objects in videos, 2017.

M. Keuper, B. Andres, and T. Brox, Motion trajectory segmentation via minimum cost multicuts, 2015.
DOI : 10.1109/iccv.2015.374

A. Khoreva, F. Galasso, M. Hein, and B. Schiele, Classifier based graph construction for video segmentation, 2015.
DOI : 10.1109/cvpr.2015.7298697

A. Khoreva, F. Perazzi, R. Benenson, B. Schiele, and A. Sorkinehornung, Learning video object segmentation from static images, 2017.

Y. J. Koh and C. S. Kim, Primary object segmentation in videos based on region augmentation and reduction, 2017.
DOI : 10.1109/cvpr.2017.784

P. Krähenbühl and V. Koltun, Efficient inference in fully connected CRFs with Gaussian edge potentials, 2011.

Y. J. Lee, J. Kim, and K. Grauman, Key-segments for video object segmentation, 2011.
DOI : 10.1109/iccv.2011.6126471
URL : http://vision.cs.utexas.edu/projects/keysegments/iccv2011_keysegments.pdf

J. Lezama, K. Alahari, J. Sivic, and I. Laptev, Track to the future: Spatio-temporal video segmentation with long-range motion cues, 2011.
DOI : 10.1109/cvpr.2011.6044588
URL : https://hal.archives-ouvertes.fr/hal-00817961

F. Li, T. Kim, A. Humayun, D. Tsai, and J. M. Rehg, Video segmentation by tracking many figure-ground segments, p.ICCV, 2013.
DOI : 10.1109/iccv.2013.273

T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona et al., Microsoft COCO: Common objects in context, 2014.
DOI : 10.1007/978-3-319-10602-1_48
URL : http://arxiv.org/pdf/1405.0312.pdf

J. Long, E. Shelhamer, and T. Darrell, Fully convolutional networks for semantic segmentation, 2015.

N. Mayer, E. Ilg, P. Häusser, P. Fischer, D. Cremers et al., A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation, 2016.

T. Mikolov, M. Karafiát, L. Burget, J. Cernock´ycernock´y, and S. Khudanpur, Recurrent neural network based language model, 2010.

M. Narayana, A. R. Hanson, and E. G. Learned-miller, Coherent motion segmentation in moving camera videos using optical flow orientations, p.ICCV, 2013.

J. Y. Ng, M. J. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga et al., Beyond short snippets: Deep networks for video classification, 2015.

P. Ochs and T. Brox, Higher order motion models and spectral clustering, 2012.

P. Ochs, J. Malik, and T. Brox, Segmentation of moving objects by long term video analysis, PAMI, vol.36, issue.6, pp.1187-1200, 2014.

A. Papazoglou and V. Ferrari, Fast object segmentation in unconstrained video, p.ICCV, 2013.

R. Pascanu, T. Mikolov, and Y. Bengio, On the difficulty of training recurrent neural networks. In: ICML, 2013.

V. Patraucean, A. Handa, and R. Cipolla, Spatio-temporal video autoencoder with differentiable memory, ICLR Workshop track, 2016.

F. Perazzi, J. Pont-tuset, B. Mcwilliams, L. Van-gool, M. Gross et al., A benchmark dataset and evaluation methodology for video object segmentation, 2016.

P. O. Pinheiro, T. Y. Lin, R. Collobert, and P. Dollár, Learning to refine object segments, 2016.

S. Ren, K. He, R. Girshick, and J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks, 2015.

J. Revaud, P. Weinzaepfel, Z. Harchaoui, and C. Schmid, EpicFlow: Edge-preserving interpolation of correspondences for optical flow, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01142656

O. Ronneberger, P. Fischer, and T. Brox, U-Net: Convolutional networks for biomedical image segmentation, 2015.

D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning representations by back-propagating errors, Nature, vol.323, pp.533-536, 1986.

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh et al., Imagenet large scale visual recognition challenge, IJCV, vol.115, issue.3, pp.211-252, 2015.

X. Shi, Z. Chen, H. Wang, D. Y. Yeung, W. Wong et al., Convolutional LSTM network: A machine learning approach for precipitation nowcasting, 2015.

K. Simonyan and A. Zisserman, Two-stream convolutional networks for action recognition in videos, 2014.

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, 2015.

N. Srivastava, E. Mansimov, and R. Salakhutdinov, Unsupervised learning of video representations using LSTMs, p.ICML, 2015.

N. Sundaram, T. Brox, and K. Keutzer, Dense point trajectories by GPU-accelerated large displacement optical flow, 2010.
DOI : 10.1007/978-3-642-15549-9_32
URL : http://nma.berkeley.edu/ark:/28722/bk00071397c

B. Taylor, V. Karasev, and S. Soatto, Causal video object segmentation from persistence of occlusions, 2015.
DOI : 10.1109/cvpr.2015.7299055
URL : http://vision.ucla.edu/papers/taylorKS15TR.pdf

T. Tieleman and G. Hinton, RMSProp. COURSERA: Lecture 6.5Neural Networks for Machine Learning, 2012.

P. Tokmakov, K. Alahari, and C. Schmid, Learning motion patterns in videos, 2017.
DOI : 10.1109/cvpr.2017.64
URL : https://hal.archives-ouvertes.fr/hal-01427480

P. Tokmakov, K. Alahari, and C. Schmid, Learning video object segmentation with visual memory, 2017.
DOI : 10.1109/iccv.2017.480
URL : https://hal.archives-ouvertes.fr/hal-01511145

P. H. Torr, Geometric motion segmentation and model selection, Phil. Trans. Royal Society of London A: Mathematical, Physical and Engineering Sciences, vol.356, pp.1321-1340, 1740.
DOI : 10.1098/rsta.1998.0224
URL : http://www.research.microsoft.com/~philtorr/Papers/Royal/royal98.ps

S. Vedula, S. Baker, P. Rander, R. Collins, and T. Kanade, Threedimensional scene flow, PAMI, vol.27, issue.3, pp.475-480, 2005.

C. Vogel, K. Schindler, and S. Roth, 3D scene flow estimation with a piecewise rigid scene model, IJCV, vol.115, issue.1, pp.1-28, 2015.
DOI : 10.1007/s11263-015-0806-0

W. Wang, J. Shen, and F. Porikli, Saliency-aware geodesic video object segmentation, 2015.

A. Wedel, T. Brox, T. Vaudrey, C. Rabe, U. Franke et al., Stereoscopic scene flow computation for 3D motion understanding, IJCV, vol.95, issue.1, pp.29-51, 2011.
DOI : 10.1007/s11263-010-0404-0

P. J. Werbos, Backpropagation through time: What it does and how to do it, Proc. IEEE, vol.78, issue.10, pp.1550-1560, 1990.
DOI : 10.1109/5.58337
URL : https://zenodo.org/record/1262035/files/article.pdf

C. Xu and J. J. Corso, Libsvx: A supervoxel library and benchmark for early video processing, International Journal of Computer Vision, vol.119, issue.3, pp.272-290, 2016.
DOI : 10.1007/s11263-016-0906-5
URL : http://arxiv.org/pdf/1512.09049

D. Zhang, O. Javed, and M. Shah, Video object segmentation through spatially accurate and temporally dense extraction of primary object regions, 2013.
DOI : 10.1109/cvpr.2013.87
URL : http://crcv.ucf.edu/papers/cvpr2013/VideoObjectSegmentation.pdf