L. Baraldi, F. Paci, G. Serra, L. Benini, and R. Cucchiara, Gesture recognition using wearable vision sensors to enhance visitors' museum experiences, IEEE Sensors Journal, vol.15, issue.5, pp.2705-2714, 2015.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout : a simple way to prevent neural networks from overfitting, The Journal of Machine Learning Research, vol.15, issue.1, pp.1929-1958, 2014.

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, vol.86, issue.11, pp.2278-2324, 1998.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in neural information processing systems, pp.1097-1105, 2012.
DOI : 10.1145/3065386
URL : http://dl.acm.org/ft_gateway.cfm?id=3065386&type=pdf

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, 2014.

K. He, X. Zhang, S. Ren, and J. Sun, Delving deep into rectifiers : Surpassing human-level performance on imagenet classification, Proceedings of the IEEE international conference on computer vision, pp.1026-1034, 2015.

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed et al., Going deeper with convolutions, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.1-9, 2015.

S. Forrest-n-iandola, . Han, K. Matthew-w-moskewicz, . Ashraf, J. William et al., Squeezenet : Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size, 2016.

Y. Ioannou, . Robertson, A. Cipolla, and . Criminisi, Deep roots : Improving cnn efficiency with hierarchical filter groups, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017. IEEE, 2017.

X. Zhang, X. Zhou, M. Lin, and J. Sun, Shufflenet : An extremely efficient convolutional neural network for mobile devices, 2017.

R. Girshick, Fast r-cnn, Proceedings of the IEEE international conference on computer vision, pp.1440-1448, 2015.
DOI : 10.1109/iccv.2015.169

J. Long, E. Shelhamer, and T. Darrell, Fully convolutional networks for semantic segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.3431-3440, 2015.

L. Zheng, Y. Yang, and Q. Tian, Sift meets cnn : A decade survey of instance retrieval, IEEE transactions on pattern analysis and machine intelligence, vol.40, pp.1224-1244, 2018.

J. Bromley, I. Guyon, Y. Lecun, E. Säckinger, and R. Shah, Signature verification using a" siamese" time delay neural network, Advances in neural information processing systems, pp.737-744, 1994.

F. Schroff, D. Kalenichenko, and J. Philbin, Facenet : A unified embedding for face recognition and clustering, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.815-823, 2015.

A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar et al., Large-scale video classification with convolutional neural networks, Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp.1725-1732, 2014.
DOI : 10.1109/cvpr.2014.223
URL : http://www.cs.cmu.edu/~rahuls/pub/cvpr2014-deepvideo-rahuls.pdf

Y. Lecun, Y. Bengio, and G. Hinton, Deep learning. nature, vol.521, p.436, 2015.

J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan et al., Long-term recurrent convolutional networks for visual recognition and description, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.2625-2634, 2015.

S. Bitgood, When is "museum fatigue" not fatigue? Curator : The Museum Journal, vol.52, pp.193-202, 2009.

M. Andrew-g-howard, B. Zhu, D. Chen, W. Kalenichenko, T. Wang et al., Mobilenets : Efficient convolutional neural networks for mobile vision applications, 2017.

J. Lanir, T. Kuflik, E. Dim, A. J. Wecker, and O. Stock, The influence of a location-aware mobile guide on museum visitors' behavior, Interacting with Computers, vol.25, issue.6, pp.443-460, 2013.

R. E. Grinter, M. Paul, M. H. Aoki, J. D. Szymanski, A. Thornton et al., Revisiting the visit :: understanding how technology can shape the museum visit, Proceedings of the 2002 ACM conference on Computer supported cooperative work, pp.146-155, 2002.

B. Gammon and A. Burch, Digital technologies and the museum experience : Handheld guides and other media, vol.35, 2008.

F. Andreacola, Musée et numérique, enjeux et mutations. Revue française des sciences de l'information et de la communication, vol.5, 2014.

A. Schmidt, M. Beigl, and H. Gellersen, There is more to context than location, Computers & Graphics, vol.23, issue.6, pp.893-901, 1999.

M. Portaz, M. Kohl, J. Chevallet, G. Quénot, and P. Mulhem, Object instance identification with fully convolutional networks. Multimedia Tools and Applications, pp.1-18, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01802287

M. Portaz, M. Kohl, G. Quénot, and J. Chevallet, Fully convolutional network and region proposal for instance identification with egocentric vision, Proceedings of the IEEE International Conference on Computer Vision, pp.2383-2391, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01887959

M. Portaz, J. Poignant, M. Budnik, P. Mulhem, J. Chevallet et al., Construction et évaluation d'un corpus pour la recherche d'instances d'images muséales, CORIA, pp.17-34, 2017.

M. Portaz, P. Mulhem, and J. Chevallet, Étude préliminaire à la recherche de photographies muséales en mobilité, CORIA 2016 COnférence en Recherche d'Information et Applications, pp.335-344, 2016.

M. Portaz, M. Budnik, P. Mulhem, and J. Poignant, Mrimlig at imageclef 2016 scalable concept image annotation task, CLEF (Working Notes), pp.363-370, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01572645

S. Knell, The shape of things to come : museums in the technological landscape, Museums in a digital age, vol.1, issue.3, p.435, 2010.

F. Paul, K. B. Marty, and . Jones, Museum informatics : People, information, and technology in museums, vol.2, 2008.

M. Paul, My lost museum : User expectations and motivations for creating personal digital collections on museum websites. Library & information science research, vol.33, pp.211-219, 2011.

E. Hooper and G. , Museums and the Shaping of Knowledge, 1992.

S. Keene, Becoming digital, Museum Management and Curatorship, vol.15, issue.3, pp.299-313, 1996.

H. John and . Falk, Identity and the museum visitor experience. Routledge, 2016.

T. Kuflik, O. Stock, M. Zancanaro, A. Gorfinkel, S. Jbara et al., A visitor's guide in an active museum : Presentations, communications, and reflection, Journal on Computing and Cultural Heritage (JOCCH), vol.3, issue.3, p.11, 2011.

A. James, P. Evans, and . Sterry, Portable computers & interactive multimedia : a new paradigm for interpreting museum collections. Archives and Museum Informatics, vol.13, pp.113-126, 1999.

A. Woodruff, M. Paul, R. E. Aoki, A. Grinter, M. H. Hurst et al., Eavesdropping on electronic guidebooks : Observing learning resources in shared listening environments, 2002.

S. Angliss, Talking sense. Museum Practice Magazine, vol.34, pp.46-47, 2006.

D. Petrelli and E. Not, User-centred design of flexible hypermedia for a mobile guide : Reflections on the hyperaudio experience, User Modeling and UserAdapted Interaction, vol.15, issue.3-4, pp.303-338, 2005.

P. Pierroux, I. Krange, and I. Sem, Bridging contexts and interpretations : Mobile blogging on art museum field trips, MedieKultur : Journal of media and communication research, vol.27, issue.50, p.18, 2011.

A. Weilenmann, T. Hillman, and B. Jungselius, Instagram at the museum : communicating the museum experience through social photo sharing, Proceedings of the SIGCHI conference on Human factors in computing systems, pp.1843-1852, 2013.

A. Kuusik, F. Sylvain-roche, and . Weis, Smartmuseum : Cultural content recommendation system for mobile users, Computer Sciences and Convergence Information Technology, 2009. ICCIT'09. Fourth International Conference on, pp.477-482, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00782151

F. Sparacino, The museum wearable : Real-time sensor-driven understanding of visitors' interests for personalized visually-augmented museum experiences. Museums and the Web, 2002.

Y. Bengio, Learning deep architectures for ai. Foundations and trends® in Machine Learning, vol.2, pp.1-127, 2009.

Y. Lecun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard et al., Backpropagation applied to handwritten zip code recognition, Neural computation, vol.1, issue.4, pp.541-551, 1989.

G. Huang, Z. Liu, Q. Kilian, L. Weinberger, and . Van-der-maaten, Densely connected convolutional networks, Proceedings of the IEEE conference on computer vision and pattern recognition, vol.1, p.3, 2017.

J. Redmon and A. Farhadi, Yolov3 : An incremental improvement. arXiv, 2018.

K. Shaoqing-ren, R. He, J. Girshick, and . Sun, Faster r-cnn : Towards real-time object detection with region proposal networks, Advances in neural information processing systems, pp.91-99, 2015.

K. Osako, R. Singh, and B. Raj, Complex recurrent neural networks for denoising speech signals, Applications of Signal Processing to Audio and Acoustics (WASPAA), pp.1-5, 2015.

D. Amodei, R. Sundaram-ananthanarayanan, J. Anubhai, E. Bai, C. Battenberg et al., Deep speech 2 : End-to-end speech recognition in english and mandarin, International Conference on Machine Learning, pp.173-182, 2016.

P. Nakov, A. Ritter, S. Rosenthal, F. Sebastiani, and V. Stoyanov, Semeval-2016 task 4 : Sentiment analysis in twitter, Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pp.1-18, 2016.

R. Raina, A. Madhavan, and A. Ng, Large-scale deep unsupervised learning using graphics processors, Proceedings of the 26th annual international conference on machine learning, pp.873-880, 2009.

X. Glorot, A. Bordes, and Y. Bengio, Deep sparse rectifier neural networks, Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp.315-323, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00752497

J. Deng, W. Dong, R. Socher, L. Li, K. Li et al., Imagenet : A large-scale hierarchical image database, Computer Vision and Pattern Recognition, pp.248-255, 2009.

K. Hornik, Approximation capabilities of multilayer feedforward networks, Neural networks, vol.4, pp.251-257, 1991.

Y. Bengio, Practical recommendations for gradient-based training of deep architectures, Neural networks : Tricks of the trade, pp.437-478, 2012.

D. Masters and C. Luschi, Revisiting small batch training for deep neural networks, 2018.

C. E. Shannon, A mathematical theory of communication, ACM SIGMOBILE mobile computing and communications review, vol.5, issue.1, pp.3-55, 2001.

K. Kawaguchi, Deep learning without poor local minima, Advances in Neural Information Processing Systems, pp.586-594, 2016.

L. Yann-a-lecun, G. B. Bottou, K. Orr, and . Müller, Efficient backprop, Neural networks : Tricks of the trade, pp.9-48, 2012.

S. Hanson, Y. Lorien, and . Pratt, Comparing biases for minimal network construction with back-propagation, Advances in neural information processing systems, pp.177-185, 1989.

A. Krogh and J. Hertz, A simple weight decay can improve generalization, Advances in neural information processing systems, pp.950-957, 1992.

S. Ioffe and C. Szegedy, Batch normalization : Accelerating deep network training by reducing internal covariate shift, 2015.

C. Szegedy, V. Vanhoucke, and S. Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.2818-2826, 2016.

K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.770-778, 2016.

J. Hu, L. Shen, and G. Sun, Squeeze-and-excitation networks, vol.7, 2017.

C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, Understanding deep learning requires rethinking generalization, 2016.

J. Sánchez and F. Perronnin, High-dimensional signature compression for large-scale image classification, Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pp.1665-1672, 2011.

K. He, X. Zhang, S. Ren, and J. Sun, Identity mappings in deep residual networks, European conference on computer vision, pp.630-645, 2016.

C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi, Inception-v4, inception-resnet and the impact of residual connections on learning, AAAI, vol.4, p.12, 2017.

B. Hassibi, . David, and . Stork, Second order derivatives for network pruning : Optimal brain surgeon, Advances in neural information processing systems, pp.164-171, 1993.

S. Han, J. Pool, J. Tran, and W. Dally, Learning both weights and connections for efficient neural network, Advances in neural information processing systems, pp.1135-1143, 2015.

S. Han, H. Mao, and W. Dally, Deep compression : Compressing deep neural networks with pruning, trained quantization and huffman coding, 2015.

W. Chen, J. Wilson, S. Tyree, K. Weinberger, and Y. Chen, Compressing neural networks with the hashing trick, International Conference on Machine Learning, pp.2285-2294, 2015.

S. Lloyd, Least squares quantization in pcm, IEEE transactions on information theory, vol.28, issue.2, pp.129-137, 1982.

M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, Xnornet : Imagenet classification using binary convolutional neural networks, European Conference on Computer Vision, pp.525-542, 2016.

T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona et al., Microsoft coco : Common objects in context, European conference on computer vision, pp.740-755, 2014.

R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.580-587, 2014.

. Quoc-v-le, Building high-level features using large scale unsupervised learning, Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pp.8595-8598, 2013.

B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, Object detectors emerge in deep scene cnns, 2014.

B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, Learning deep features for discriminative localization, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.2921-2929, 2016.

A. Mikulik, M. Perdoch, O. Chum, and J. Matas, Learning vocabularies over a fine quantization, International journal of computer vision, vol.103, issue.1, pp.163-175, 2013.

G. Tolias and H. Jégou, Visual query expansion with or without geometry : refining local descriptors by feature aggregation, Pattern recognition, vol.47, issue.10, pp.3466-3476, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00971267

A. Gordo, J. Almazán, J. Revaud, and D. Larlus, Deep image retrieval : Learning global representations for image search, European Conference on Computer Vision, pp.241-257, 2016.

. David-g-lowe, Object recognition from local scale-invariant features. In Computer vision, The proceedings of the seventh IEEE international conference on, vol.2, pp.1150-1157, 1999.

. David-g-lowe, Distinctive image features from scale-invariant keypoints. International journal of computer vision, vol.60, pp.91-110, 2004.

V. Ferrari, T. Tuytelaars, and L. Van-gool, Simultaneous object recognition and segmentation by image exploration, European Conference on Computer Vision, pp.40-54, 2004.

K. Mikolajczyk and C. Schmid, A performance evaluation of local descriptors, IEEE transactions on pattern analysis and machine intelligence, vol.27, pp.1615-1630, 2005.
URL : https://hal.archives-ouvertes.fr/inria-00548227

E. Karami, S. Prasad, and M. Shehata, Image matching using sift, surf, brief and orb : performance comparison for distorted images, 2017.

K. Mikolajczyk and C. Schmid, Indexing based on scale invariant interest points, Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Conference on, vol.1, pp.525-531, 2001.
URL : https://hal.archives-ouvertes.fr/inria-00548276

L. Juan and O. Gwun, A comparison of sift, pca-sift and surf, International Journal of Image Processing (IJIP), vol.3, issue.4, pp.143-152, 2009.

L. Chiu, T. Chang, J. Chen, and N. Chang, Fast sift design for real-time visual feature extraction, IEEE Transactions on Image Processing, vol.22, issue.8, pp.3158-3167, 2013.

Y. Ke and R. Sukthankar, Pca-sift : A more distinctive representation for local image descriptors, Proceedings of the 2004 IEEE Computer Society Conference on, vol.2, 2004.

H. Bay, A. Ess, T. Tuytelaars, and L. Van-gool, Computer vision and image understanding, vol.110, pp.346-359, 2008.

M. Calonder, V. Lepetit, C. Strecha, and P. Fua, Brief : Binary robust independent elementary features, European conference on computer vision, pp.778-792, 2010.

E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, Orb : An efficient alternative to sift or surf, Computer Vision (ICCV), 2011 IEEE international conference on, pp.2564-2571, 2011.

E. Rosten and T. Drummond, Machine learning for high-speed corner detection, European conference on computer vision, pp.430-443, 2006.

J. Sivic and A. Zisserman, Video google : A text retrieval approach to object matching in videos, iccv, vol.2, pp.1470-1477, 2003.

Y. Zhang, Z. Jia, and T. Chen, Image retrieval with geometrypreserving visual phrases, Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pp.809-816, 2011.

D. Nister and H. Stewenius, Scalable recognition with a vocabulary tree, Computer vision and pattern recognition, vol.2, pp.2161-2168, 2006.

J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, Object retrieval with large vocabularies and fast spatial matching, Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on, pp.1-8, 2007.

H. Jegou, M. Douze, and C. Schmid, Hamming embedding and weak geometric consistency for large scale image search, European conference on computer vision, pp.304-317, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00316866

F. Perronnin, Y. Liu, J. Sánchez, and H. Poirier, Large-scale image retrieval with compressed fisher vectors, Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pp.3384-3391, 2010.

H. Jégou, M. Douze, C. Schmid, and P. Pérez, Aggregating local descriptors into a compact image representation, Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pp.3304-3311, 2010.

O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman, Total recall : Automatic query expansion with a generative feature model for object retrieval, Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, pp.1-8, 2007.

R. Arandjelovi? and A. Zisserman, Three things everyone should know to improve object retrieval, Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pp.2911-2918, 2012.

M. Perd'och, O. Chum, and J. Matas, Efficient representation of local geometry for large scale object retrieval, Computer Vision and Pattern Recognition, pp.9-16, 2009.

A. Sharif-razavian, H. Azizpour, J. Sullivan, and S. Carlsson, Cnn features off-the-shelf : an astounding baseline for recognition, Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp.806-813, 2014.

M. Oquab, L. Bottou, I. Laptev, and J. Sivic, Learning and transferring mid-level image representations using convolutional neural networks, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.1717-1724, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00911179

J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang et al., Decaf : A deep convolutional activation feature for generic visual recognition, International conference on machine learning, pp.647-655, 2014.

H. Azizpour, A. Sharif-razavian, J. Sullivan, A. Maki, and S. Carlsson, From generic to specific deep representations for visual recognition, Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp.36-45, 2015.

A. Babenko and V. Lempitsky, Aggregating local deep features for image retrieval, Proceedings of the IEEE international conference on computer vision, pp.1269-1277, 2015.

A. Sharif-razavian, J. Sullivan, A. Maki, and S. Carlsson, A baseline for visual instance retrieval with deep convolutional networks, International Conference on Learning Representations, 2015.

Y. Kalantidis, C. Mellina, and S. Osindero, Cross-dimensional weighting for aggregated deep convolutional features, European Conference on Computer Vision, pp.685-701, 2016.

G. Tolias, R. Sicre, and H. Jégou, Particular object retrieval with integral max-pooling of cnn activations, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01842218

F. Perronnin and D. Larlus, Fisher vectors meet neural networks : A hybrid classification architecture, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.3743-3752, 2015.

Y. Gong, L. Wang, R. Guo, and S. Lazebnik, Multi-scale orderless pooling of deep convolutional activation features, European conference on computer vision, pp.392-407, 2014.

M. Paulin, M. Douze, Z. Harchaoui, J. Mairal, F. Perronin et al., Local convolutional features with unsupervised training for image retrieval, Proceedings of the IEEE international conference on computer vision, pp.91-99, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01207966

A. Babenko, A. Slesarev, A. Chigorin, and V. Lempitsky, Neural codes for image retrieval, European conference on computer vision, pp.584-599, 2014.

F. Radenovi?, G. Tolias, and O. Chum, Cnn image retrieval learns from bow : Unsupervised fine-tuning with hard examples, European conference on computer vision, pp.3-20, 2016.

P. Baldi and Y. Chauvin, Neural networks for fingerprint recognition, Neural Computation, vol.5, issue.3, pp.402-418, 1993.

J. Liu, J. Luo, and M. Shah, Recognizing realistic actions from videos "in the wild, Computer vision and pattern recognition, pp.1996-2003, 2009.

H. Wang, M. M. Ullah, A. Klaser, I. Laptev, and C. Schmid, Evaluation of local spatio-temporal features for action recognition, BMVC 2009-British Machine Vision Conference, pp.124-125, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00439769

P. Dollár, V. Rabaud, G. Cottrell, and S. Belongie, Behavior recognition via sparse spatio-temporal features. In Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2nd Joint IEEE International Workshop on, pp.65-72, 2005.

H. Wang, A. Kläser, C. Schmid, and C. Liu, Action recognition by dense trajectories, Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pp.3169-3176, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00583818

I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, Learning realistic human actions from movies, Computer Vision and Pattern Recognition, pp.1-8, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00548659

M. Baccouche, F. Mamalet, C. Wolf, C. Garcia, and A. Baskurt, Sequential deep learning for human action recognition, International Workshop on Human Behavior Understanding, pp.29-39, 2011.
URL : https://hal.archives-ouvertes.fr/hal-01354493

K. Simonyan and A. Zisserman, Two-stream convolutional networks for action recognition in videos, Advances in neural information processing systems, pp.568-576, 2014.

S. Hochreiter, The vanishing gradient problem during learning recurrent neural nets and problem solutions, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol.6, issue.02, pp.107-116, 1998.

S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural computation, vol.9, issue.8, pp.1735-1780, 1997.

K. Cho, B. Van-merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares et al., Learning phrase representations using rnn encoder-decoder for statistical machine translation, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01433235

J. Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga et al., Beyond short snippets : Deep networks for video classification, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.4694-4702, 2015.

N. Srivastava, E. Mansimov, and R. Salakhudinov, Unsupervised learning of video representations using lstms, International conference on machine learning, pp.843-852, 2015.

L. Yao, A. Torabi, K. Cho, N. Ballas, C. Pal et al., Video description generation incorporating spatiotemporal features and a soft-attention mechanism, 2015.

H. Jégou and O. Chum, Negative evidences and co-occurences in image retrieval : The benefit of pca and whitening, Computer Vision-ECCV 2012, pp.774-787, 2012.

P. Turcot and D. G. Lowe, Better matching with fewer features : The selection of useful features in large database recognition problems, Computer Vision Workshops (ICCV Workshops), pp.2109-2116, 2009.

N. Ma, X. Zhang, H. Zheng, and J. Sun, Shufflenet v2 : Practical guidelines for efficient cnn architecture design, 2018.

J. Poignant, M. Budnik, H. Bredin, C. Barras, M. Stefas et al., The camomile collaborative annotation platform for multi-modal, multilingual and multi-media documents, LREC 2016 Conference, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01350096