G. Awad, J. Fiscus, M. Michel, D. Joy, W. Kraaij et al., Trecvid 2016: Evaluating video search, video event detection , localization, and hyperlinking, Proceedings of TRECVID, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01854776

E. Apostolidis, V. Mezaris, M. Sahuguet, B. Huet, D. Barbora?ervenkovábarbora?barbora?ervenková et al., Automatic finegrained hyperlinking of videos within a closed collection using scene segmentation, Proceedings of the 22nd ACM international conference on Multimedia, pp.1033-1036, 2014.
DOI : 10.1145/2647868.2655041

P. Galu??áková, M. Batko, J. Cech, J. Matas, D. Novák et al., Visual Descriptors in Methods for Video Hyperlinking, Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval , ICMR '17, pp.294-300, 2017.
DOI : 10.2307/3001968

V. Vukoti´cvukoti´c, C. Raymond, and G. Gravier, Bidirectional joint representation learning with symmetrical deep neural networks for multimodal and crossmodal applications, Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, pp.343-346, 2016.

Y. Yu, H. Ko, J. Choi, and G. Kim, End-to-End Concept Word Detection for Video Captioning, Retrieval, and Question Answering, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.3261-3269, 2017.
DOI : 10.1109/CVPR.2017.347

URL : http://arxiv.org/pdf/1610.02947

A. Araujo, J. Chaves, R. Angst, and B. Girod, Temporal aggregation for large-scale queryby-image video retrieval, Image Processing (ICIP), 2015 IEEE International Conference on, pp.1519-1522, 2015.
DOI : 10.1109/icip.2015.7351054

X. Wu, G. Alexander, C. Hauptmann, and . Ngo, Practical elimination of near-duplicates from web video search, Proceedings of the 15th international conference on Multimedia , MULTIMEDIA '07, pp.218-227, 2007.
DOI : 10.1145/1291233.1291280

V. Vukoti´cvukoti´c, C. Raymond, and G. Gravier, Multimodal and crossmodal representation learning from textual and visual features with bidirectional deep neural networks for video hyperlinking, Proceedings of the 2016 ACM workshop on Vision and Language Integration Meets Multimedia Fusion, pp.37-44, 2016.

K. He, X. Zhang, S. Ren, and J. Sun, Deep Residual Learning for Image Recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.770-778, 2016.
DOI : 10.1109/CVPR.2016.90

URL : http://arxiv.org/pdf/1512.03385

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed et al., Going deeper with convolutions, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1-9, 2015.
DOI : 10.1109/CVPR.2015.7298594

URL : http://arxiv.org/pdf/1409.4842

S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, Aggregated Residual Transformations for Deep Neural Networks, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2017.634

URL : http://arxiv.org/pdf/1611.05431

A. Iscen, T. Furon, V. Gripon, M. Rabbat, and H. Jégou, Memory Vectors for Similarity Search in High-Dimensional Spaces, IEEE Transactions on Big Data, vol.4, issue.1, 2017.
DOI : 10.1109/TBDATA.2017.2677964

URL : https://hal.archives-ouvertes.fr/hal-01481220

J. Gauvain, L. Lamel, and G. Adda, The LIMSI Broadcast News transcription system, Speech Communication, vol.37, issue.1-2, pp.89-108, 2002.
DOI : 10.1016/S0167-6393(01)00061-9

URL : https://hal.archives-ouvertes.fr/hal-01434493

T. Mikolov, I. Sutskever, K. Chen, S. Greg, J. Corrado et al., Distributed representations of words and phrases and their compositionality, Advances in neural information processing systems, pp.3111-3119, 2013.

Q. Le and T. Mikolov, Distributed representations of sentences and documents, Proceedings of the 31st International Conference on Machine Learning, pp.1188-1196, 2014.

R. Kiros, Y. Zhu, R. Ruslan, R. Salakhutdinov, R. Zemel et al., Skip-thought vectors, Advances in neural information processing systems, pp.3294-3302, 2015.

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, 2014.

J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, How transferable are features in deep neural networks?, Advances in neural information processing systems, pp.3320-3328, 2014.

R. Bois, V. Vukoti´cvukoti´c, R. Sicre, C. Raymond, G. Gravier et al., Irisa at trecvid2016: Crossmodality, multimodality and monomodality for video hyperlinking, Proceedings of TRECVID, 2016.
DOI : 10.1007/978-3-319-51814-5_16

URL : https://hal.archives-ouvertes.fr/hal-01400275