A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, Neural Information Processing Systems (NIPS), 2012.
DOI : 10.1162/neco.2009.10-08-881
URL : http://dl.acm.org/ft_gateway.cfm?id=3065386&type=pdf

C. Szegedy, A. Toshev, and D. Erhan, Deep neural networks for object detection, Neural Information Processing Systems (NIPS), 2013.

P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus et al., Overfeat: Integrated recognition, localization and detection using convolutional networks, International Conference on Learning Representations (ICLR), 2014.

R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.81
URL : http://arxiv.org/pdf/1311.2524

K. He, X. Zhang, S. Ren, and J. Sun, Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.37, issue.9, pp.1904-1916, 2015.
DOI : 10.1109/TPAMI.2015.2389824
URL : http://arxiv.org/pdf/1406.4729

R. Girshick, Fast R-CNN, 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
DOI : 10.1109/ICCV.2015.169

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, You Only Look Once: Unified, Real-Time Object Detection, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.91
URL : http://arxiv.org/pdf/1506.02640

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed et al., SSD: Single shot multibox detector The Pascal visual object classes (VOC) challenge, European Conference on Computer Vision (ECCV), pp.303-338, 2010.

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh et al., ImageNet Large Scale Visual Recognition Challenge, International Journal of Computer Vision, vol.1010, issue.1, pp.1-42, 2015.
DOI : 10.1007/978-3-642-15555-0_11
URL : http://dspace.mit.edu/bitstream/1721.1/104944/1/11263_2015_Article_816.pdf

T. Lin, M. Maire, S. Belongie, L. D. Bourdev, R. B. Girshick et al., Microsoft COCO: Common Objects in Context, 1405.
DOI : 10.1007/978-3-319-10602-1_48
URL : http://arxiv.org/pdf/1405.0312.pdf

J. Hoffman, S. Guadarrama, E. Tzeng, R. Hu, J. Donahue et al., LSDA: Large scale detection through adaptation, Neural Information Processing Systems (NIPS), 2014.

B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, Object detectors emerge in deep scene CNNs, International Conference on Learning Representations (ICLR), 2015.

T. Deselaers and V. Ferrari, Visual and semantic similarity in ImageNet, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995474
URL : http://research.google.com/pubs/archive/37065.pdf

Y. Tang, J. Wang, B. Gao, E. Dellandrea, R. Gaizauskas et al., Large Scale Semi-Supervised Object Detection Using Visual and Semantic Knowledge Transfer, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.233
URL : https://hal.archives-ouvertes.fr/hal-01488579

D. Crandall and D. Huttenlocher, Weakly Supervised Learning of Part-Based Spatial Models for Visual Object Recognition, European Conference on Computer Vision (ECCV), 2006.
DOI : 10.1109/CVPR.2005.251

O. Chum and A. Zisserman, An Exemplar Model for Learning Object Classes, 2007 IEEE Conference on Computer Vision and Pattern Recognition
DOI : 10.1109/CVPR.2007.383050

C. Galleguillos, B. Babenko, A. Rabinovich, and S. Belongie, Weakly supervised object recognition and localization with stable segmentations, European Conference on Computer Vision (ECCV), 2008.
DOI : 10.1007/978-3-540-88682-2_16
URL : http://vision.ucsd.edu/sites/default/files/galleguillos_eccv08_0.pdf

M. Nguyen, L. Torresani, F. De-la-torre, and C. Rother, Weakly supervised discriminative localization and classification: a joint learning process, International Conference on Computer Vision (ICCV), 2009.
DOI : 10.1109/iccv.2009.5459426
URL : http://www.ri.cmu.edu/pub_files/2009/10/SegSVM_ICCV09_submittedFinal.pdf

P. Siva and T. Xiang, Weakly supervised object detector learning with model drift detection, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126261
URL : http://www.psiva.ca/Publications/ICCV2011.pdf

M. Pandey and S. Lazebnik, Scene recognition and weakly supervised object localization with deformable part-based models, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126383
URL : http://www.cs.unc.edu/~lazebnik/publications/megha_iccv2011.pdf

P. Siva, C. Russell, and T. Xiang, In Defence of Negative Mining for Annotating Weakly Labelled Data, European Conference on Computer Vision (ECCV), 2012.
DOI : 10.1007/978-3-642-33712-3_43

T. Deselaers, B. Alexe, and V. Ferrari, Weakly Supervised Localization and Learning with Generic Knowledge, International Journal of Computer Vision, vol.73, issue.2, pp.275-293, 2012.
DOI : 10.1007/s11263-006-9794-4

Z. Shi, T. M. Hospedales, and T. Xiang, Bayesian Joint Topic Modelling for Weakly Supervised Object Localisation, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.371
URL : http://arxiv.org/pdf/1705.03372

Y. Tang, X. Wang, E. Dellandrea, S. Masnou, and L. Chen, Fusing generic objectness and deformable part-based models for weakly supervised object detection, 2014 IEEE International Conference on Image Processing (ICIP), 2014.
DOI : 10.1109/ICIP.2014.7025827
URL : https://hal.archives-ouvertes.fr/hal-01301105

H. Bilen, M. Pedersoli, and T. Tuytelaars, Weakly supervised object detection with posterior regularization, British Machine Vision Conference (BMVC), 2014.
DOI : 10.5244/c.28.52

H. O. Song, R. Girshick, S. Jegelka, J. Mairal, Z. Harchaoui et al., On learning to localize objects with minimal supervision, International Conference on Machine Learning (ICML), 2014.
URL : https://hal.archives-ouvertes.fr/hal-00996849

H. Bilen, M. Pedersoli, and T. Tuytelaars, Weakly supervised object detection with convex clustering, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2015.7298711
URL : https://lirias.kuleuven.be/bitstream/123456789/511404/1/3966_final_OA.pdf

C. Wang, K. Huang, W. Ren, J. Zhang, and S. Maybank, Large-Scale Weakly Supervised Object Localization via Latent Category Learning, IEEE Transactions on Image Processing, vol.24, issue.4, pp.1371-1385, 2015.
DOI : 10.1109/TIP.2015.2396361

Y. Tang, X. Wang, E. Dellandrea, and L. Chen, Weakly Supervised Learning of Deformable Part-Based Models for Object Detection via Region Proposals, IEEE Transactions on Multimedia, vol.19, issue.2, pp.1-1, 2016.
DOI : 10.1109/TMM.2016.2614862
URL : https://hal.archives-ouvertes.fr/hal-01488575

J. Uijlings, K. Van-de-sande, T. Gevers, and A. Smeulders, Selective Search for Object Recognition, International Journal of Computer Vision, vol.57, issue.1, pp.154-171, 2013.
DOI : 10.1023/B:VISI.0000013087.49260.fb
URL : http://www.science.uva.nl/research/publications/2011/vandeSandeICCV2011/vandesande_iccv2011.pdf

M. Cheng, Z. Zhang, W. Lin, and P. Torr, BING: Binarized Normed Gradients for Objectness Estimation at 300fps, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.414

C. L. Zitnick and P. Dollár, Edge Boxes: Locating Object Proposals from Edges, European Conference on Computer Vision (ECCV), 2014.
DOI : 10.1007/978-3-319-10602-1_26
URL : http://research.microsoft.com/en-us/um/people/larryz/ZitnickDollarECCV14edgeBoxes.pdf

M. Oquab, L. Bottou, I. Laptev, and J. Sivic, Is object localization for free? - Weakly-supervised learning with convolutional neural networks, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2015.7298668
URL : https://hal.archives-ouvertes.fr/hal-01015140

H. Bilen and A. Vedaldi, Weakly Supervised Deep Detection Networks, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.311
URL : http://arxiv.org/pdf/1511.02853

B. Zhou, A. Khosla, L. A. , A. Oliva, and A. Torralba, Learning Deep Features for Discriminative Localization, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.319
URL : http://arxiv.org/pdf/1512.04150

L. Shao, F. Zhu, and X. Li, Transfer learning for visual categorization: A survey, IEEE Transactions on Neural Networks and Learning Systems (TNNLS), pp.1019-1034, 2015.

J. Donahue, J. Hoffman, E. Rodner, K. Saenko, and T. Darrell, Semi-supervised Domain Adaptation with Instance Constraints, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.92
URL : http://www.cv-foundation.org/openaccess/content_cvpr_2013/papers/Donahue_Semi-supervised_Domain_Adaptation_2013_CVPR_paper.pdf

M. Oquab, L. Bottou, I. Laptev, and J. Sivic, Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.222
URL : https://hal.archives-ouvertes.fr/hal-00911179

M. Rochan and Y. Wang, Weakly supervised localization of novel objects using appearance transfer, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2015.7299060
URL : http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Rochan_Weakly_Supervised_Localization_2015_CVPR_paper.pdf

X. Shu, G. Qi, J. Tang, and J. Wang, Weakly-Shared Deep Transfer Networks for Heterogeneous-Domain Knowledge Propagation, Proceedings of the 23rd ACM international conference on Multimedia, MM '15, 2015.
DOI : 10.1145/2647868.2654914

Y. Zhu, Y. Chen, Z. Lu, S. J. Pan, G. Xue et al., Heterogeneous transfer learning for image classification, AAAI Conference on Artificial Intelligence (AAAI), 2011.

Y. Lu, L. Chen, A. Saidi, E. Dellandrea, and Y. Wang, Discriminative Transfer Learning Using Similarities and Dissimilarities, IEEE Transactions on Neural Networks and Learning Systems (TNNLS), pp.1-14, 2017.
DOI : 10.1109/TNNLS.2017.2705760

K. K. Singh, F. Xiao, and Y. J. Lee, Track and Transfer: Watching Videos to Simulate Strong Human Supervision for Weakly-Supervised Object Detection, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.386
URL : http://arxiv.org/pdf/1604.05766

A. Frome, G. Corrado, J. Shlens, S. Bengio, J. Dean et al., Devise: A deep visual-semantic embedding model, Neural Information Processing Systems (NIPS), 2013.

Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, Greedy layer-wise training of deep networks, Neural Information Processing Systems (NIPS), 2007.

C. Fellbaum and E. , WordNet: An Electronic Lexical Database, 1998.

C. Leacock and M. Chodorow, Combining local context and WordNet similarity for word sense identification, WordNet: An Electronic Lexical Database, pp.265-283, 1998.

P. Resnik, Using information content to evaluate semantic similarity in a taxonomy, Proceedings of the International Joint Conference for Artificial Intelligence (IJCAI-95), 1995.

D. Lin, An information-theoretic definition of similarity, International Conference on Machine Learning (ICML), 1998.

T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, Distributed representations of words and phrases and their compositionality, Neural Information Processing Systems (NIPS), 2013.

J. Pennington, R. Socher, and C. D. Manning, Glove: Global Vectors for Word Representation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014.
DOI : 10.3115/v1/D14-1162
URL : http://nlp.stanford.edu/projects/glove/glove.pdf

T. Mikolov, W. Yih, and G. Zweig, Linguistic regularities in continuous space word representations, Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 2013.

I. Misra, A. Shrivastava, and M. Hebert, Watch and learn: Semisupervised learning of object detectors from videos, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/cvpr.2015.7298982
URL : http://arxiv.org/pdf/1505.05769

C. Rosenberg, M. Hebert, and H. Schneiderman, Semi-Supervised Self-Training of Object Detection Models, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05), Volume 1, 2005.
DOI : 10.1109/ACVMOT.2005.107

Y. Yang, G. Shu, and M. Shah, Semi-supervised Learning of Feature Hierarchies for Object Detection in a Video, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.216

P. Agrawal, R. Girshick, and J. Malik, Analyzing the Performance of Multilayer Neural Networks for Object Recognition, European Conference on Computer Vision (ECCV), 2014.
DOI : 10.1007/978-3-319-10584-0_22

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long et al., Caffe, Proceedings of the ACM International Conference on Multimedia, MM '14, 2014.
DOI : 10.1145/2647868.2654889

M. Zeiler and R. Fergus, Visualizing and Understanding Convolutional Networks, European Conference on Computer Vision (ECCV), 2014.
DOI : 10.1007/978-3-319-10590-1_53
URL : http://cs.nyu.edu/%7Efergus/papers/zeilerECCV2014.pdf

M. Rohrbach, M. Stark, G. Szarvas, I. Gurevych, and B. Schiele, What helps where – and why? Semantic relatedness for knowledge transfer, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5540121

S. Rothe and H. Schützesch¨schütze, AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2015.
DOI : 10.3115/v1/P15-1173
URL : http://arxiv.org/pdf/1507.01127

B. Gao, E. Dellandrea, and L. Chen, Music sparse decomposition onto a MIDI dictionary of musical words and its application to music mood classification, 2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI), 2012.
DOI : 10.1109/CBMI.2012.6269798
URL : https://hal.archives-ouvertes.fr/hal-01353057

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, International Conference on Learning Representations (ICLR), 2015.

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, Rethinking the Inception Architecture for Computer Vision, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.308
URL : http://arxiv.org/pdf/1512.00567

K. He, X. Zhang, S. Ren, and J. Sun, Deep Residual Learning for Image Recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.90
URL : http://arxiv.org/pdf/1512.03385

R. Girshick, J. Donahue, T. Darrell, and J. Malik, Region-Based Convolutional Networks for Accurate Object Detection and Segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.38, issue.1, pp.142-158, 2016.
DOI : 10.1109/TPAMI.2015.2437384