Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, Greedy layer-wise training of deep networks Advances in neural information processing systems, 2007.

J. Dai, K. He, and J. Sun, Convolutional feature masking for joint object and stuff segmentation, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
DOI : 10.1109/CVPR.2015.7299025

N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2005.
DOI : 10.1109/CVPR.2005.177

URL : https://hal.archives-ouvertes.fr/inria-00548512

J. Deng, W. Dong, R. Socher, L. Li, K. Li et al., Imagenet: A large-scale hierarchical image database, Computer Vision and Pattern Recognition, 2009.

J. Dong, Q. Chen, S. Yan, and A. Yuille, Towards Unified Object Detection and Semantic Segmentation, Computer Vision?ECCV 2014, 2014.
DOI : 10.1007/978-3-319-10602-1_20

M. Everingham, L. Van-gool, C. Williams, J. Winn, and A. Zisserman, The Pascal Visual Object Classes (VOC) Challenge, International Journal of Computer Vision, vol.73, issue.2, 2006.
DOI : 10.1007/s11263-009-0275-4

M. Everingham, L. Van-gool, C. Williams, J. Winn, and A. Zisserman, The Pascal Visual Object Classes (VOC) Challenge, International Journal of Computer Vision, vol.73, issue.2, 2008.
DOI : 10.1007/s11263-009-0275-4

P. F. Felzenszwalb, R. B. Girshick, D. Mcallester, and D. Ramanan, Object detection with discriminatively trained partbased models. Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.1, issue.6, 2010.

S. Gidaris and N. Komodakis, Object detection via a multiregion & semantic segmentation-aware cnn model. arXiv preprint, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01245664

R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2014.81

B. Hariharan, P. Arbeláez, R. Girshick, and J. Malik, Simultaneous Detection and Segmentation, Computer Vision? ECCV 2014, 2014.
DOI : 10.1007/978-3-319-10584-0_20

K. He, X. Zhang, S. Ren, and J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition, 2014.

K. He, X. Zhang, S. Ren, and J. Sun, Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
DOI : 10.1109/ICCV.2015.123

G. E. Hinton and R. R. Salakhutdinov, Reducing the Dimensionality of Data with Neural Networks, Science, vol.313, issue.5786, 2006.
DOI : 10.1126/science.1127647

D. Hoiem, Y. Chodpathumwan, and Q. Dai, Diagnosing Error in Object Detectors, Computer Vision?ECCV 2012, 2012.
DOI : 10.1007/978-3-642-33712-3_25

S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint, 2015.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in neural information processing systems, 2012.

Y. Lecun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard et al., Backpropagation Applied to Handwritten Zip Code Recognition, Neural Computation, vol.1, issue.4, 1989.
DOI : 10.1007/BF00133697

M. Leordeanu, A. Radu, and R. Sukthankar, Features in concert: Discriminative feature selection meets unsupervised clustering. arXiv preprint, 2014.

M. Lin, Q. Chen, and S. Yan, Network in network. CoRR, abs/1312, 2013.
URL : https://hal.archives-ouvertes.fr/hal-01460127

J. Long, E. Shelhamer, and T. Darrell, Fully convolutional networks for semantic segmentation. arXiv preprint arXiv:1411, 2014.

D. G. Lowe, Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, vol.60, issue.2, 2004.
DOI : 10.1023/B:VISI.0000029664.99615.94

R. Mottaghi, X. Chen, X. Liu, N. Cho, S. Lee et al., The Role of Context for Object Detection and Semantic Segmentation in the Wild, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.119

P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus et al., Overfeat: Integrated recognition, localization and detection using convolutional networks, 2013.

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409, 1556.

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed et al., Going deeper with convolutions. arXiv preprint, 2014.

K. E. Van-de-sande, J. R. Uijlings, T. Gevers, and A. W. Smeulders, Segmentation as selective search for object recognition, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126456

A. Vedaldi, V. Gulshan, M. Varma, and A. Zisserman, Multiple kernels for object detection, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459183

Z. Yuting, S. Kihyuk, V. Ruben, P. Gang, and H. Lee, Improving object detection with deep convolutional networks via bayesian optimization and structured prediction, 2008.

M. D. Zeiler and R. Fergus, Visualizing and Understanding Convolutional Networks, Computer Vision?ECCV 2014, 2014.
DOI : 10.1007/978-3-319-10590-1_53

Y. Zhu, R. Urtasun, R. Salakhutdinov, and S. Fidler, segdeepm: Exploiting segmentation and context in deep neural networks for object detection, 2015.