T. Akiba, S. Suzuki, and K. Fukuda, Extremely large minibatch SGD: training resnet-50 on imagenet in 15 minutes, 2017.

B. Alexe, T. Deselaers, and V. Ferrari, What is an object?, The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, pp.73-80, 2010.

B. Alexe, T. Deselaers, and V. Ferrari, Measuring the objectness of image windows, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, issue.11, pp.2189-2202, 2012.

S. K. Hassan-abu-alhaija, L. M. Mustikovela, A. Mescheder, C. Geiger, and . Rother, Augmented reality meets computer vision: Efficient data generation for urban driving scenes, International Journal of Computer Vision (IJCV), vol.126, issue.9, pp.961-972, 2018.

P. Ammirato, P. Poirson, E. Park, J. Kosecka, and A. C. Berg, A dataset for developing and benchmarking active vision, IEEE International Conference on Robotics and Automation (ICRA), cs.CV, 2017.

A. Angelova, A. Krizhevsky, V. Vanhoucke, S. Abhijit, D. Ogale et al., Real-time pedestrian detection with deep network cascades, Proceedings of the British Machine Vision Conference, vol.2, p.4, 2015.

A. Antoniou, A. J. Storkey, and H. Edwards, Data augmentation generative adversarial networks. CoRR, 2017.

Y. Seung-hwan-bae, Y. Lee, Y. Jo, J. Bae, and . Hwang, Rank of experts: Detection network ensemble, 2017.

Y. Bai, Y. Zhang, M. Ding, and B. Ghanem, SODMTGAN: Small Object Detection via Multi-Task Generative Adversarial Network, Computer Vision-ECCV 2018-15th European Conference, p.16, 2018.

A. Bansal, K. Sikka, G. Sharma, R. Chellappa, and A. Divakaran, Zero-shot object detection. CoRR, abs/1804.04340, 2018.

P. W. Battaglia, J. B. Hamrick, V. Bapst, A. Sanchez-gonzalez, V. Flores-zambaldi et al., Relational inductive biases, deep learning, and graph networks, 2018.

L. Bazzani, A. Bergamo, D. Anguelov, and L. Torresani, Self-taught object localization with deep networks, 2016 IEEE Winter Conference on Applications of Computer Vision, WACV 2016, pp.1-9, 2016.
DOI : 10.1109/wacv.2016.7477688
URL : http://arxiv.org/pdf/1409.3964

K. Behrendt and L. Novak, A Deep Learning Approach to Traffic Lights: Detection, Tracking, and Classification, 2017 IEEE International Conference On, 2017.
DOI : 10.1109/icra.2017.7989163

J. Beltrán, C. Guindel, F. M. Moreno, D. Cruzado, F. García et al., Birdnet: a 3d object detection framework from lidar information, 2018.

R. Benenson, M. Mathias, R. Timofte, and L. Van-gool, Pedestrian detection at 100 frames per second, 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp.2903-2910, 2012.
DOI : 10.1109/cvpr.2012.6248017

S. Bianco, M. Buzzelli, D. Mazzini, and R. Schettini, Deep Learning for Logo Recognition, Neurocomputing, vol.245, pp.23-30, 2017.
DOI : 10.1016/j.neucom.2017.03.051
URL : http://arxiv.org/pdf/1701.02620

H. Bilen and A. Vedaldi, Weakly Supervised Deep Detection Networks, 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, 2016.
DOI : 10.1109/cvpr.2016.311

H. Bilen, M. Pedersoli, and T. Tuytelaars, Weakly supervised object detection with convex clustering, IEEE Conference on Computer Vision and Pattern Recognition, 2015.
DOI : 10.1109/cvpr.2015.7298711
URL : https://lirias.kuleuven.be/bitstream/123456789/511404/1/3966_final_OA.pdf

B. Yang, J. Yan, Z. Lei, and S. Z. Li, Fine-grained evaluation on face detection in the wild, Automatic Face and Gesture Recognition (FG), pp.1-7, 2015.

N. Bodla, B. Singh, R. Chellappa, and L. Davis, Softnms-improving object detection with one line of code, IEEE International Conference on Computer Vision, pp.5562-5570, 2017.
DOI : 10.1109/iccv.2017.593
URL : http://arxiv.org/pdf/1704.04503

L. Bourdev, S. Maji, T. Brox, and J. Malik, Detecting people using mutually consistent poselet activations, Computer Vision-ECCV 2010, 11th European Conference on Computer Vision, pp.168-181, 2010.
DOI : 10.1007/978-3-642-15567-3_13
URL : http://www.cs.berkeley.edu/%7Esmaji/papers/bmbm-poselets-eccv10.pdf

K. Bousmalis, N. Silberman, D. Dohan, D. Erhan, and D. Krishnan, Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks, 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp.95-104, 2017.
DOI : 10.1109/cvpr.2017.18
URL : http://arxiv.org/pdf/1612.05424

S. Brahmbhatt, H. I. Christensen, and J. Hays, StuffNet-Using 'Stuff' to Improve Object Detection, IEEE Winter Conf. on Applications of Computer Vision (WACV), 2017.
DOI : 10.1109/wacv.2017.109
URL : http://arxiv.org/pdf/1610.05861

M. Braun, S. Krebs, F. Flohr, and D. M. Gavrila, The eurocity persons dataset: A novel benchmark for object detection, 2018.

M. Busta, L. Neumann, and J. Matas, Deep textspotter: An end-toend trainable scene text localization and recognition framework, IEEE International Conference on Computer Vision, pp.2223-2231, 2017.
DOI : 10.1109/iccv.2017.242

Z. Cai, Q. Fan, S. Rogerio, N. Feris, and . Vasconcelos, A unified multi-scale deep convolutional neural network for fast object detection, Computer Vision-ECCV 2016-14th European Conference, pp.354-370, 2016.
DOI : 10.1007/978-3-319-46493-0_22
URL : http://arxiv.org/pdf/1607.07155

G. Cao, X. Xie, W. Yang, Q. Liao, G. Shi et al., Feature-fused SSD: fast detection for small objects, 2017.

J. Carreira and C. Sminchisescu, Constrained parametric min-cuts for automatic object segmentation, The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, pp.3241-3248, 2010.
DOI : 10.1109/cvpr.2010.5540063
URL : http://lmb.informatik.uni-freiburg.de/lectures/seminar_brox/seminar_ws1011/cvpr10_carreira1.pdf

J. Carreira and C. Sminchisescu, Cpmc: Automatic object segmentation using constrained parametric min-cuts, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, issue.7, pp.1312-1328, 2011.
DOI : 10.1109/tpami.2011.231

F. M. Castro, M. J. Marín-jiménez, N. Guil, C. Schmid, and K. Alahari, End-to-End Incremental Learning, Computer Vision-ECCV 2018-15th European Conference, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01849366

F. Chabot, M. Chaouch, J. Rabarisoa, T. Célineteulì-ere, and . Chateau, Deep MANTA: A coarse-to-fine many-task network for joint 2d and 3d vehicle analysis from monocular image, 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp.1827-1836, 2017.
DOI : 10.1109/cvpr.2017.198
URL : https://hal.archives-ouvertes.fr/hal-01653519

C. Chen, M. Liu-0001, O. Tuzel, and J. Xiao, R-CNN for Small Object Detection, Computer Vision-ACCV 2016-13th Asian Conference on Computer Vision, vol.10115, pp.214-230, 2016.
DOI : 10.1007/978-3-319-54193-8_14

D. Chen, G. Hua, F. Wen, and J. Sun, Supervised transformer network for efficient face detection, Computer Vision-ECCV 2016-14th European Conference, 2016.
DOI : 10.1007/978-3-319-46454-1_8
URL : http://arxiv.org/pdf/1607.05477

G. Chen, Y. Ding, J. Xiao, and T. Han, Detection evolution with multi-order contextual co-occurrence, 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp.1798-1805, 2013.
DOI : 10.1109/cvpr.2013.235
URL : http://www.eecis.udel.edu/~ding/files/cvpr13.pdf

H. Chen, Y. Wang, G. Wang, and Y. Qiao, LSTD: A low-shot transfer detector for object detection, Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018.

K. Chen, H. Song, C. C. Loy, and D. Lin, Discover and Learn New Objects from Documentaries, 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp.1111-1120, 2017.
DOI : 10.1109/cvpr.2017.124
URL : http://arxiv.org/pdf/1707.09593

K. Chen, J. Wang, S. Yang, X. Zhang, Y. Xiong et al., Optimizing video object detection via a scale-time lattice, 2018.

L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.40, issue.4, pp.834-848, 2018.
DOI : 10.1109/tpami.2017.2699184
URL : http://arxiv.org/pdf/1606.00915

C. Shang-tse-chen, J. Cornelius, D. Martin, and . Chau, Robust physical adversarial attack on faster R-CNN object detector, 2018.

X. Chen, K. Kundu, Z. Zhang, H. Ma, and S. Fidler, Monocular 3d object detection for autonomous driving, 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016.
DOI : 10.1109/cvpr.2016.236

X. Chen, K. Kundu, Y. Zhu, A. G. Berneshawi, and H. Ma, Sanja Fidler, and Raquel Urtasun. 3d object proposals for accurate object class detection, Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems, pp.424-432, 2015.

X. Chen, H. Ma, X. Wang, and Z. Zhao, Improving object proposals with multi-thresholding straddling expansion, IEEE Conference on Computer Vision and Pattern Recognition, 2015.

X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, Multi-view 3d object detection network for autonomous driving, 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp.6526-6534, 2017.
DOI : 10.1109/cvpr.2017.691
URL : http://arxiv.org/pdf/1611.07759

X. Chen and A. Gupta, Spatial Memory for Context Reasoning in Object Detection, 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017.
DOI : 10.1109/iccv.2017.440
URL : http://arxiv.org/pdf/1704.04224

Y. Chen, W. Li, C. Sakaridis, D. Dai, and L. Van-gool, Domain adaptive faster R-CNN for object detection in the wild, 2018.

Y. Chen, J. Li, H. Xiao, X. Jin, S. Yan et al., Dual path networks, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, pp.4467-4475, 2017.

Y. Chen, J. Li, B. Zhou, J. Feng, and S. Yan, Weaving multi-scale context for single shot detector, 2017.

G. Cheng, P. Zhou, and J. Han, RIFD-CNN: Rotation-Invariant and Fisher Discriminative Convolutional Neural Networks for Object Detection, 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016.
DOI : 10.1109/cvpr.2016.315

G. Cheng and J. Han, A Survey on Object Detection in Optical Remote Sensing Images, ISPRS Journal of Photogrammetry and Remote Sensing, vol.117, pp.11-28, 2016.
DOI : 10.1016/j.isprsjprs.2016.03.014
URL : https://manuscript.elsevier.com/S0924271616300144/pdf/S0924271616300144.pdf

G. Cheng, P. Zhou, and J. Han, Learning rotation-invariant convolutional neural networks for object detection in vhr optical remote sensing images, IEEE Transactions on Geoscience and Remote Sensing, vol.54, issue.12, pp.7405-7415, 2016.
DOI : 10.1109/tgrs.2016.2601622

J. Cheng, L. Dong, and M. Lapata, Long short-term memorynetworks for machine reading, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp.551-561, 2016.

M. Cheng, Z. Zhang, W. Lin, and P. Torr, Bing: Binarized normed gradients for objectness estimation at 300fps, 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp.3286-3293, 2014.

F. Chollet, Xception: Deep learning with depthwise separable convolutions, 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp.1800-1807, 2017.

M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler et al., The cityscapes dataset for semantic urban scene understanding, Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

G. Csurka, A comprehensive survey on domain adaptation for visual applications, Advances in Computer Vision and Pattern Recognition, pp.1-35, 2017.

B. Ekin-dogus-cubuk, D. Zoph, V. Mané, Q. V. Vasudevan, and . Le, Autoaugment: Learning augmentation policies from data, 2018.

J. Dai, K. He, and J. Sun, Instance-aware semantic segmentation via multi-task network cascades, 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp.3150-3158, 2016.

J. Dai, Y. Li, K. He, and J. Sun, R-fcn: Object detection via region-based fully convolutional networks, Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems, pp.379-387, 2016.

J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang et al., Deformable convolutional networks, IEEE International Conference on Computer Vision, pp.764-773, 2017.

N. Dalal and B. Triggs, Histograms of oriented gradients for human detection, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005, vol.1, pp.886-893, 2005.
URL : https://hal.archives-ouvertes.fr/inria-00548512

M. Delakis and C. Garcia, text detection with convolutional neural networks, International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISAP), pp.290-294, 2008.

J. Deng, W. Dong, R. Socher, L. Li, K. Li et al., Imagenet: A large-scale hierarchical image database, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.248-255, 2009.

Z. Deng, H. Sun, S. Zhou, J. Zhao, and H. Zou, Toward Fast and Accurate Vehicle Detection in Aerial Images Using Coupled RegionBased Convolutional Neural Networks, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol.10, pp.3652-3664, 2017.

Z. Deng and L. Latecki, Amodal Detection of 3D Objects: Inferring 3D Bounding Boxes from 2D Ones in RGB-Depth Images, 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp.398-406, 2017.

T. Devries and G. W. Taylor, Dataset augmentation in feature space, CoRR, 2017.

T. Devries and G. W. Taylor, Improved regularization of convolutional neural networks with cutout, 2017.

P. Dollar, C. Wojek, B. Schiele, and P. Perona, Pedestrian detection: An evaluation of the state of the art, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, issue.4, pp.743-761, 2012.

X. Dong, L. Zheng, F. Ma, Y. Yang, and D. Meng, Few-shot object detection. CoRR, abs/1706.08249, 2017.

T. Durand, N. Thome, and M. Cord, MANTRA: Minimum Maximum Latent Structural SVM for Image Classification and Ranking, IEEE International Conference on Computer Vision, ICCV 2015, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01343784

T. Durand, N. Thome, and M. Cord, Weldon: Weakly supervised learning of deep convolutional neural networks, 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01343785

T. Durand, T. Mordan, N. Thome, and M. Cord, WILDCAT: Weakly Supervised Learning of Deep ConvNets for Image Classification, Pointwise Localization and Segmentation, 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017.
DOI : 10.1109/cvpr.2017.631
URL : https://hal.archives-ouvertes.fr/hal-01515640

N. Dvornik, J. Mairal, and C. Schmid, Modeling Visual Context is Key to Augmenting Object Detection Datasets, Computer VisionECCV 2018-15th European Conference, p.18, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01844474

D. Dwibedi, Synthesizing scenes for instance detection, 2017.

D. Dwibedi, I. Misra, and M. Hebert, Cut, paste and learn: Surprisingly easy synthesis for instance detection, IEEE International Conference on Computer Vision, pp.1310-1319, 2017.
DOI : 10.1109/iccv.2017.146
URL : http://arxiv.org/pdf/1708.01642

C. Eggert, D. Zecha, S. Brehm, and R. Lienhart, Improving small object proposals for company logo detection, Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, pp.167-174, 2017.
DOI : 10.1145/3078971.3078990
URL : http://arxiv.org/pdf/1704.08881

I. Endres and D. Hoiem, Category independent object proposals, Computer Vision-ECCV 2010, 11th European Conference on Computer Vision, pp.575-588, 2010.
DOI : 10.1007/978-3-642-15555-0_42
URL : http://www-2.cs.cmu.edu/%7Edhoiem/publications/eccv2010_CategoryIndependentProposals_ian.pdf

I. Endres and D. Hoiem, Category-independent object proposals with diverse ranking, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.36, issue.2, pp.222-234, 2014.
DOI : 10.1109/tpami.2013.122
URL : http://web.engr.illinois.edu/~iendres2/publications/pami2013_proposals_preprint.pdf

M. Engelcke, D. Rao, D. Z. Wang, C. H. Tong, and I. Posner, Vote3Deep: Fast Object Detection in 3D Point Clouds Using Efficient Convolutional Neural Networks, IEEE International Conference on Robotics and Automation (ICRA), 2017.
DOI : 10.1109/icra.2017.7989161
URL : http://arxiv.org/pdf/1609.06666

M. Enzweiler, . Dariu, and . Gavrila, Monocular pedestrian detection: Survey and experiments, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.31, issue.12, pp.2179-2195, 2008.
DOI : 10.1109/tpami.2008.260
URL : http://gavrila.net/pami09.pdf

M. Enzweiler, M. Dariu, and . Gavrila, A multilevel mixture-of-experts framework for pedestrian classification, IEEE Transactions on Image Processing, vol.20, issue.10, pp.2967-2979, 2011.
DOI : 10.1109/tip.2011.2142006
URL : http://www.gavrila.net/tip11.pdf

D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov, Scalable Object Detection Using Deep Neural Networks, IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, 2014.
DOI : 10.1109/cvpr.2014.276
URL : http://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Erhan_Scalable_Object_Detection_2014_CVPR_paper.pdf

A. Ess, B. Leibe, and L. Van-gool, Depth and appearance for mobile scene analysis, IEEE 11th International Conference on Computer Vision, ICCV, pp.1-8, 2007.
DOI : 10.1109/iccv.2007.4409092

M. Everingham, L. Van-gool, K. I. Christopher, J. Williams, A. Winn et al., The pascal visual object classes (voc) challenge. International Journal of Computer Vision (IJCV), vol.88, pp.303-338, 2010.
DOI : 10.1007/s11263-009-0275-4

C. Feichtenhofer, A. Pinz, and A. Zisserman, Detect to track and track to detect, 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp.3038-3046, 2017.
DOI : 10.1109/iccv.2017.330
URL : http://arxiv.org/pdf/1710.03958

P. F. Felzenszwalb, R. B. Girshick, D. A. Mcallester, and D. Ramanan, Object detection with discriminatively trained part-based models, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.32, issue.9, pp.1627-1645, 2010.
DOI : 10.1109/tpami.2009.167
URL : http://people.cs.uchicago.edu/~pff/papers/lsvm-pami.pdf

C. Ruth, A. Fong, and . Vedaldi, Interpretable explanations of black boxes by meaningful perturbation, IEEE International Conference on Computer Vision, pp.3449-3457, 2017.

C. Fu, W. Liu, A. Ranga, A. Tyagi, and A. C. Berg, DSSD : Deconvolutional single shot detector. CoRR, abs/1701.06659, 2017.

Y. Fu, T. Xiang, Y. Jiang, X. Xue, L. Sigal et al., Recent advances in zero-shot recognition: Toward data-efficient understanding of visual content, IEEE Signal Processing Magazine, vol.35, issue.1, pp.112-125, 2018.

A. Gaidon, Y. Wang, E. Cabon, and . Vig, Virtual worlds as proxy for multiobject tracking analysis, 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016.

M. Gao, R. Yu, A. Li, V. I. Morariu, and L. S. Davis, Dynamic zoom-in network for fast object detection in large images, 2017.

C. Garcia and M. Delakis, A neural architecture for fast and robust face detection, Pattern Recognition, 2002. Proceedings. 16th International Conference on, vol.2, pp.44-47, 2002.

W. Ge, S. Yang, and Y. Yu, Multi-evidence filtering and fusion for multi-label classification, object detection and semantic segmentation based on weakly supervised learning, Computer Vision and Pattern Recognition (CVPR), 2018 IEEE Conference on, 2018.

A. Geiger, P. Lenz, and R. Urtasun, Are we ready for autonomous driving? the kitti vision benchmark suite, 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp.3354-3361, 1621.

G. Georgakis, A. Mousavian, A. C. Berg, and J. Kosecka, Synthesizing training data for object detection in indoor scenes, Robotics: Science and Systems XIII, Massachusetts Institute of Technology, 2017.

D. Gerónimo, A. D. Sappa, A. López, and D. Ponsa, Adaptive image sampling and windows classification for on-board pedestrian detection, Proceedings of the 5th International Conference on Computer Vision Systems (ICVS 2007), 2007.

S. Gidaris and N. Komodakis, Attend Refine Repeat-Active Box Proposal Generation via In-Out Localization, Proceedings of the British Machine Vision Conference, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01832771

S. Gidaris and N. Komodakis, Object detection via a multi-region and semantic segmentation-aware cnn model, IEEE Conference on Computer Vision and Pattern Recognition, pp.1134-1142, 2015.

S. Gidaris and N. Komodakis, LocNet: Improving Localization Accuracy for Object Detection, 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01832507

R. Girshick, Fast r-cnn, IEEE International Conference on Computer Vision, ICCV 2015, pp.1440-1448, 2015.

R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp.580-587, 2014.

R. B. Girshick, F. N. Iandola, T. Darrell, and J. Malik, Deformable part models are convolutional neural networks, IEEE Conference on Computer Vision and Pattern Recognition, 2015.

A. Gonzalez-garcia, D. Modolo, and V. Ferrari, Objects as context for part detection, 2017.

I. Goodfellow, J. Pouget-abadie, M. Mirza, B. Xu, D. Wardefarley et al., Generative adversarial nets, Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems, pp.2672-2680, 2014.

I. J. Goodfellow, D. Warde-farley, M. Mirza, A. C. Courville, and Y. Bengio, Maxout networks, Proceedings of the 30th International Conference on Machine Learning, pp.16-21, 2013.

P. Goyal, P. Dollár, R. B. Girshick, P. Noordhuis, L. Wesolowski et al., Yangqing Jia, and Kaiming He. Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR, abs/1706.02677, 2017.

A. Gupta, A. Vedaldi, and A. Zisserman, Synthetic Data for Text Localisation in Natural Images, 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp.2315-2324, 2016.

S. Gupta, B. Hariharan, and J. Malik, Exploring person context and local scene context for object detection. CoRR, abs/1511.08177, 2015.

S. Han, H. Mao, and W. J. Dally, Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding, 2015.

W. Han, P. Khorrami, T. L. Paine, P. Ramachandran, M. Babaeizadeh et al., Seq-nms for video object detection, 2016.

K. He, X. Zhang, S. Ren, and J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.37, issue.9, pp.1904-1916, 2015.

K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp.770-778, 2016.

K. He, G. Gkioxari, P. Dollár, and R. Girshick, Mask r-cnn, IEEE International Conference on Computer Vision, pp.2980-2988, 2017.

Z. Tong-he, W. Tian, C. Huang, Y. Shen, C. Qiao et al., An end-to-end textspotter with explicit alignment and attention, Computer Vision and Pattern Recognition (CVPR), 2018 IEEE Conference on, 2018.

G. Heitz and D. Koller, Learning Spatial Context-Using Stuff to Find Things, Computer Vision-ECCV 2008, 10th European Conference on Computer Vision, 2008.

P. Henderson and V. Ferrari, End-to-end training of object class detectors for mean average precision, Computer Vision-ACCV 2016-13th

, Asian Conference on Computer Vision, pp.198-213, 2016.

F. João, A. Henriques, and . Vedaldi, Warped Convolutions-Efficient Invariance to Spatial Transformations, International Conference on Machine Learning (ICML), 2017.

C. Hetang, H. Qin, S. Liu, and J. Yan, Impression network for video object detection. CoRR, abs/1712.05896, 2017.

M. Himmelsbach, A. Mueller, T. Lüttel, and H. Wünsche, Lidar-based 3d object perception, Proceedings of 1st international workshop on cognition for technical systems, vol.1, 2008.

S. Hinterstoisser, V. Lepetit, P. Wohlhart, and K. Konolige, On pre-trained image features and synthetic images for deep learning, 2017.

E. Hjelmås, . Boon-kee, and . Low, Face Detection: A Survey. Computer Vision and Image Understanding (CVIU), vol.83, pp.236-274, 2001.

J. Hoffman, S. Guadarrama, E. S. Tzeng, R. Hu, J. Donahue et al., Lsda: Large scale detection through adaptation, Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems, pp.3536-3544, 2014.

J. Hoffman, D. Pathak, T. Darrell, and K. Saenko, Detector discovery in the wild: Joint multiple instance and representation learning, IEEE Conference on Computer Vision and Pattern Recognition, pp.2883-2891, 2015.

D. Hoiem, Y. Chodpathumwan, and Q. Dai, Diagnosing error in object detectors, Computer Vision-ECCV 2012-12th European Conference on Computer Vision, pp.340-353, 2012.

J. Hosang, R. Benenson, and B. Schiele, A convnet for non-maximum suppression, German Conference on Pattern Recognition, pp.192-204, 2016.

J. H. Hosang, R. Benenson, and B. Schiele, How good are detection proposals, really, British Machine Vision Conference, 2014.

J. H. Hosang, R. Benenson, and B. Schiele, Learning nonmaximum suppression, 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp.6469-6477, 2017.

S. Houben, J. Stallkamp, J. Salmen, M. Schlipsing, and C. Igel, Detection of traffic signs in real-world images: The German Traffic Sign Detection Benchmark, International Joint Conference on Neural Networks, p.1288, 2013.

A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang et al., Mobilenets: Efficient convolutional neural networks for mobile vision applications, 2017.

H. Hu, J. Gu, Z. Zhang, J. Dai, and Y. Wei, Relation networks for object detection, Computer Vision and Pattern Recognition (CVPR), 2018 IEEE Conference on, 2018.

J. Hu, L. Shen, and G. Sun, Squeeze-and-excitation networks. CoRR, 2017.

P. Hu and D. Ramanan, Finding tiny faces, 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp.1522-1530, 2017.
DOI : 10.1109/cvpr.2017.166

G. Huang, S. Liu, L. Van-der-maaten, and K. Q. Weinberger, Condensenet: An efficient densenet using learned group convolutions, 2017.

G. Huang, Z. Liu, L. Van-der-maaten, and . Weinberger, Densely connected convolutional networks, 2017 IEEE Conference on Computer Vision and Pattern Recognition, vol.1, p.3, 2017.
DOI : 10.1109/cvpr.2017.243
URL : http://arxiv.org/pdf/1608.06993

J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara et al., Speed/accuracy trade-offs for modern convolutional object detectors, 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017.
DOI : 10.1109/cvpr.2017.351
URL : http://arxiv.org/pdf/1611.10012

Q. Huang, K. Shaohua, S. Zhou, U. You, and . Neumann, Learning to prune filters in convolutional neural networks, 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018, pp.709-718, 2018.
DOI : 10.1109/wacv.2018.00083
URL : http://arxiv.org/pdf/1801.07365

X. Huang, M. Liu, S. J. Belongie, and J. Kautz, Multimodal unsupervised image-to-image translation, 2018.

I. Hubara, M. Courbariaux, D. Soudry, R. El-yaniv, and Y. Bengio, Binarized neural networks, Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems, pp.4107-4115, 2016.

I. Hubara, M. Courbariaux, D. Soudry, R. El-yaniv, and Y. Bengio, Quantized neural networks: Training neural networks with low precision weights and activations, The Journal of Machine Learning Research, vol.18, issue.1, pp.6869-6898, 2017.

A. Humayun, F. Li, and J. Rehg, Rigor: Reusing inference in graph cuts for generating object regions, 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp.336-343, 2014.
DOI : 10.1109/cvpr.2014.50
URL : http://cpl.cc.gatech.edu/projects/RIGOR/pubs/humayun_CVPR_2014_rigor.pdf

B. Huval, A. Coates, and A. Y. Ng, Deep learning for class-generic object detection. CoRR, abs/1312, vol.6885, 2013.

F. N. Iandola, M. W. Moskewicz, S. Karayev, R. B. Girshick, T. Darrell et al., Densenet: Implementing efficient convnet descriptor pyramids. CoRR, abs/1404.1869, 2014.

S. Forrest-n-iandola, . Han, K. Matthew-w-moskewicz, . Ashraf, J. William et al., Squeezenet: Alexnet-level accuracy with 50x fewer parameters and¡ 0.5 mb model size. CoRR, abs/1602.07360v3, 2016.

H. Inoue, Data augmentation by pairing samples for images classification, 2018.

N. Inoue, R. Furuta, T. Yamasaki, and K. Aizawa, Cross-domain weakly-supervised object detection through progressive domain adaptation. CoRR, abs/1803.11365, 2018.

S. Ioffe, Batch renormalization: Towards reducing minibatch dependence in batch-normalized models, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, pp.1942-1950, 2017.

S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, pp.448-456, 2015.

M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman, Synthetic data and artificial neural networks for natural scene text recognition. CoRR, abs/1406.2227, 2014.

M. Jaderberg, K. Simonyan, and A. Zisserman, Spatial transformer networks, Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems, 2015.

V. Jain and E. Learned-miller, FDDB: A Benchmark for Face Detection in Unconstrained Settings, 2010.

J. Jeong, H. Park, and N. Kwak, Enhancement of SSD by concatenating feature maps for object detection. CoRR, abs/1705.09587, 2017.

S. Jha, N. Agarwal, and S. Agarwal, Towards improved cartoon face detection and recognition systems, CoRR, 2018.

B. Jiang, R. Luo, J. Mao, T. Xiao, and Y. Jiang, Acquisition of localization confidence for accurate object detection, 2018.

Y. Jiang, X. Zhu, X. Wang, S. Yang, W. Li et al., R2CNN: rotational region CNN for orientation robust scene text detection. CoRR, abs/1706.09579, 2017.

A. Joly and O. Buisson, Logo retrieval with a contrario visual query expansion, Wen Gao, Yong Rui
DOI : 10.1145/1631272.1631361

A. Steinbach, M. X. El-saddik, and . Zhou, Proceedings of the 17th International Conference on Multimedia, pp.581-584, 2009.

A. Kinjal, . Joshi, and . Darshak-g-thakore, A Survey on Moving Object Detection and Tracking in Video Surveillance System, International Journal of Soft Computing and Engineering (IJSCE), vol.2, issue.3, 2012.

H. Kang, M. Hebert, A. A. Efros, and T. Kanade, Datadriven objectness, IEEE Transactions on Pattern Analysis and Machine Intelligence, issue.1, pp.189-195, 2015.
DOI : 10.1109/tpami.2014.2315811

K. Kang, W. Ouyang, H. Li, and X. Wang, Object detection from video tubelets with convolutional neural networks, 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp.817-825, 2016.
DOI : 10.1109/cvpr.2016.95
URL : http://arxiv.org/pdf/1604.04053

K. Kang, H. Li, J. Yan, X. Zeng, B. Yang et al., Tubelets with Convolutional Neural Networks for Object Detection from Videos. IEEE Transactions on Circuits and Systems for Video Technology, pp.1-1, 2017.
DOI : 10.1109/tcsvt.2017.2736553
URL : http://arxiv.org/pdf/1604.02532

D. Karatzas, L. Gomez-bigorda, A. Nicolaou, S. Ghosh, A. Bagdanov et al., 13th International Conference on Document Analysis and Recognition (ICDAR), pp.1156-1160, 2015.

T. Karras, T. Aila, S. Laine, and J. Lehtinen, Progressive growing of GANs for improved quality, stability, and variation, International Conference on Learning Representations, 2018.

H. Katti, M. V. Peelen, and S. P. Arun, Object detection can be improved using human-derived contextual expectations, 2016.

G. Keren, M. Schmitt, T. Kehrenberg, and B. W. Schuller, Weakly supervised one-shot detection with attention siamese networks, 2018.

A. Khosla, T. Zhou, T. Malisiewicz, A. A. Efros, and A. Torralba, Undoing the damage of dataset bias, Computer Vision-ECCV 2012-12th European Conference on Computer Vision, pp.158-171, 2012.
DOI : 10.1007/978-3-642-33718-5_12
URL : http://people.csail.mit.edu/tomasz/papers/khosla_eccv2012.pdf

K. Kim, Y. Cheon, S. Hong, B. Roh, and M. Park, PVANET: deep but lightweight neural networks for real-time object detection, 2016.

P. Diederik, J. Kingma, and . Ba, Adam: A method for stochastic optimization. CoRR, abs/1412, vol.6980, 2014.

B. Brendan-f-klare, E. Klein, A. Taborsky, J. Blanton, K. Cheney et al., Pushing the frontiers of unconstrained face detection and recognition: Iarpa janus benchmark a, IEEE Conference on Computer Vision and Pattern Recognition, pp.1931-1939, 2015.

I. Kokkinos, Ubernet: Training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory, 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp.5454-5463, 2017.
DOI : 10.1109/cvpr.2017.579
URL : http://arxiv.org/pdf/1609.02132

T. Kong, A. Yao, Y. Chen, and F. Sun, HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection, 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016.
DOI : 10.1109/cvpr.2016.98
URL : http://arxiv.org/pdf/1604.00600

T. Kong, F. Sun, A. Yao, H. Liu, M. Lu et al., RON: reverse connection with objectness prior networks for object detection, 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp.5244-5252, 2017.

T. Kong, F. Sun, W. Huang, and H. Liu, Deep feature pyramid reconfiguration for object detection. CoRR, abs/1808.07993, 2018.

M. Kostinger, P. Wohlhart, P. M. Roth, and H. Bischof, Annotated Facial Landmarks in the Wild: A large-scale, real-world database for facial landmark localization, First IEEE International Workshop on Benchmarking Facial Image Analysis Technologies, pp.2144-2151, 2011.

I. Krasin, T. Duerig, N. Alldrin, V. Ferrari, S. Abu-el-haija et al., Openimages: A public dataset for large-scale multi-label and multi-class image classification, 2017.

R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata et al., Visual genome: Connecting language and vision using crowdsourced dense image annotations, International Journal of Computer Vision (IJCV), vol.123, issue.1, pp.32-73, 2017.

A. Krizhevsky, One weird trick for parallelizing convolutional neural networks. CoRR, abs/1404, vol.5997, 2014.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems, pp.1106-1114, 2012.

J. Ku, M. Mozifian, J. Lee, A. Harakeh, and S. L. Waslander, Joint 3d proposal generation and object detection from view aggregation, 2017.

K. K. Singh, F. Xiao, and Y. Lee, Track and transfer: Watching videos to simulate strong human supervision for weakly-supervised object detection, 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, pp.3548-3556, 2016.

W. Kuo, B. Hariharan, and J. Malik, Deepbox: Learning objectness with convolutional networks, IEEE International Conference on Computer Vision, ICCV 2015, pp.2479-2487, 2015.

J. D. Lafferty, A. Mccallum, and F. C. Pereira, Conditional random fields: Probabilistic models for segmenting and labeling sequence data, Proceedings of the Eighteenth International Conference on Machine Learning, ICML '01, pp.282-289, 2001.

D. Lam, R. Kuzma, K. Mcgee, S. Dooley, M. Laielli et al., xview: Objects in context in overhead imagery, 2018.

C. H. Lampert, M. B. Blaschko, and T. Hofmann, Beyond sliding windows: Object localization by efficient subwindow search, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR, pp.24-26, 2008.

D. Laptev, N. Savinov, J. M. Buhmann, and M. Pollefeys, TI-POOLING: transformation-invariant pooling for feature learning in convolutional neural networks, 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp.289-297, 2016.

H. Law and J. Deng, Cornernet: Detecting objects as paired keypoints, Computer Vision-ECCV 2018-15th European Conference, 2018.

Y. Lecun, L. Bottou, G. B. Orr, and K. Müller, Efficient backprop, Neural Networks: Tricks of the Trade-Second Edition, pp.9-48, 2012.

B. Lee, E. Erdenee, S. Jin, M. Y. Nam, Y. G. Jung et al., Multi-class multi-object tracking using changing point detection, Computer Vision-ECCV 2016-14th European Conference, vol.9914, pp.68-83, 2016.

K. Lee, J. Choi, J. Jeong, and N. Kwak, Residual features and unified prediction network for single stage detection, 2017.

Y. Lee, H. Kim, E. Park, X. Cui, and H. Kim, Wideresidual-inception networks for real-time object detection, Intelligent Vehicles Symposium (IV), 2017 IEEE, pp.758-764, 2017.

J. Lemley, S. Bazrafkan, and P. Corcoran, Smart augmentation learning an optimal data augmentation strategy, IEEE Access, vol.5, pp.5858-5869, 2017.

B. Li, 3D Fully Convolutional Network for Vehicle Detection in Point Cloud, IROS, 2017.

B. Li, T. Wu, S. Shao, L. Zhang, and R. Chu, Object detection via end-to-end integration of aspect ratio and context aware part-based models and fully convolutional networks, 2016.

B. Li, T. Zhang, and T. Xia, Vehicle detection from 3d lidar using fully convolutional network, Robotics: Science and Systems XII, 2016.

H. Li, Z. Lin, X. Shen, J. Brandt, and G. Hua, A convolutional neural network cascade for face detection, IEEE Conference on Computer Vision and Pattern Recognition, pp.5325-5334, 2015.

H. Li, Y. Liu, W. Ouyang, and X. Wang, Zoom out-and-in network with recursive training for object proposal, 2017.

J. Li, X. Liang, S. Shen, T. Xu, J. Feng et al., Scale-aware fast r-cnn for pedestrian detection, IEEE Transactions on Multimedia, 2017.

J. Li, X. Liang, Y. Wei, T. Xu, J. Feng et al., Perceptual generative adversarial networks for small object detection, 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp.1951-1959, 2017.

X. Li, F. Flohr, Y. Yang, H. Xiong, M. Braun et al., A new benchmark for vision-based cyclist detection, Intelligent Vehicles Symposium (IV), pp.1028-1033, 2016.

Y. Li, H. Qi, J. Dai, X. Ji, and Y. Wei, Fully convolutional instance-aware semantic segmentation, 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp.4438-4446, 2017.

Y. Li, W. Ouyang, B. Zhou, K. Wang, and X. Wang, Scene graph generation from objects, phrases and caption regions, 2017.

Y. Li, J. Li, W. Lin, and J. Li, Tiny-DSOD: Lightweight Object Detection for Resource-Restricted Usages, Proceedings of the British Machine Vision Conference, 2018.

Z. Li, C. Peng, G. Yu, X. Zhang, Y. Deng et al., Light-head R-CNN: in defense of two-stage object detector, 2017.

Z. Li, Y. Chen, G. Yu, and Y. Deng, R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection, AAAI, p.8, 2018.

Z. Li, C. Peng, G. Yu, X. Zhang, Y. Deng et al., Detnet: A backbone network for object detection, 2018.

Z. Li and D. Hoiem, Learning without Forgetting, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018.

Z. Li and F. Zhou, FSSD: feature fusion single shot multibox detector, 2017.

M. Liao, B. Shi, and X. Bai, Textboxes++: A single-shot oriented scene text detector. CoRR, abs/1801.02765, 2018.

M. Liao, Z. Zhu, B. Shi, G. Xia, and X. Bai, Rotationsensitive regression for oriented scene text detection, 2018.

Y. Liao, X. Lu, C. Zhang, Y. Wang, and Z. Tang, Mutual Enhancement for Detection of Multiple Logos in Sports Videos, IEEE International Conference on Computer Vision, pp.4856-4865, 2017.

T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona et al., Microsoft coco: Common objects in context, Computer Vision-ECCV 2014-13th European Conference, pp.740-755, 2014.

T. Lin, P. Dollár, and R. Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature pyramid networks for object detection, 2017 IEEE Conference on Computer Vision and Pattern Recognition, vol.1, p.4, 2017.

T. Lin, P. Goyal, and R. B. Girshick, Kaiming He, and Piotr Dollár. Focal loss for dense object detection, IEEE International Conference on Computer Vision, pp.2999-3007, 2017.

Z. Lin, L. S. Davis, D. S. Doermann, and D. Dementhon, Hierarchical part-template matching for human detection and segmentation, IEEE 11th International Conference on Computer Vision, pp.1-8, 2007.
DOI : 10.1109/iccv.2007.4408975

Z. Lin, M. Courbariaux, R. Memisevic, and Y. Bengio, Neural networks with few multiplications, 2015.

K. Liu and G. Mattyus, Fast multiclass vehicle detection on aerial images, IEEE Geoscience and Remote Sensing Letters, vol.12, issue.9, pp.1938-1942, 2015.

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed et al., Ssd: Single shot multibox detector, Computer Vision-ECCV 2016-14th European Conference, pp.21-37, 2016.
DOI : 10.1007/978-3-319-46448-0_2
URL : http://arxiv.org/pdf/1512.02325

Y. Liu and L. Jin, Deep matching prior network: Toward tighter multi-oriented text detection, 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp.3454-3461, 2017.
DOI : 10.1109/cvpr.2017.368
URL : http://arxiv.org/pdf/1703.01425

Z. Liu, L. Yuan, L. Weng, and Y. Yang, A high resolution optical satellite image dataset for ship recognition and some new baselines, ICPRAM, pp.324-331, 2017.
DOI : 10.5220/0006120603240331

. David-g-lowe, The proceedings of the seventh IEEE international conference on, Computer vision, vol.2, pp.1150-1157, 1999.

. David-g-lowe, Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision (IJCV), vol.60, issue.2, pp.91-110, 2004.

J. Lu, H. Sibai, E. Fabry, and D. A. Forsyth, Standard detectors aren't (currently) fooled by physical adversarial stop signs, 2017.

S. M. Lucas, A. Panaretos, L. Sosa, A. Tang, S. Wong et al., Icdar 2003 robust reading competitions, Seventh International Conference on Document Analysis and Recognition, pp.682-687, 2003.
DOI : 10.1109/icdar.2003.1227749
URL : https://hal.archives-ouvertes.fr/hal-01527429

J. Ma, W. Shao, H. Ye, L. Wang, H. Wang et al., Arbitrary-Oriented Scene Text Detection via Rotation Proposals, IEEE Transactions on Multimedia, pp.1-1, 2018.
DOI : 10.1109/tmm.2018.2818020
URL : http://arxiv.org/pdf/1703.01086

S. Manen, M. Guillaumin, and L. Van-gool, Prime object proposals with randomized prim's algorithm, IEEE International Conference on Computer Vision, ICCV 2013, pp.2536-2543, 2013.
DOI : 10.1109/iccv.2013.315
URL : https://lirias.kuleuven.be/bitstream/123456789/450464/1/3716_final_OA.pdf

J. Mao, T. Xiao, Y. Jiang, and Z. Cao, What can help pedestrian detection, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.6034-6043, 2017.
DOI : 10.1109/cvpr.2017.639
URL : http://arxiv.org/pdf/1705.02757

V. Y. Mariano, J. Min, J. Park, R. Kasturi, D. Mihalcik et al., Performance evaluation of object detection algorithms, International Conference on Pattern Recognition (ICPR), vol.3, pp.965-969, 2002.
DOI : 10.1109/icpr.2002.1048198

O. Maron and T. Lozano-pérez, A framework for multiple-instance learning, Advances in Neural Information Processing Systems, vol.10, pp.570-576, 1997.

M. Masana, J. Van-de-weijer, and A. D. Bagdanov, On-the-fly network pruning for object detection, 2016.

M. Masana, J. Van-de-weijer, L. Herranz, A. D. Bagdanov, and J. M. Alvarez, Domain-adaptive deep network compression, IEEE International Conference on Computer Vision, pp.4299-4307, 2017.
DOI : 10.1109/iccv.2017.460
URL : http://arxiv.org/pdf/1709.01041

F. Massa, B. C. Russell, and M. Aubry, Deep Exemplar 2D-3D Detection by Adapting from Real to Rendered Views, IEEE Conference on Computer Vision and Pattern Recognition, pp.6024-6033, 2016.
DOI : 10.1109/cvpr.2016.648
URL : https://hal.archives-ouvertes.fr/hal-01801049

O. Matan, H. S. Baird, J. Bromley, J. C. Christopher, J. S. Burges et al., Reading handwritten digits: A zip code recognition system, IEEE Computer, vol.25, issue.7, pp.59-63, 1992.
DOI : 10.1109/2.144441
URL : http://oro.open.ac.uk/35664/1/matan-92.pdf

B. Maze, J. Adams, J. A. Duncan, N. Kalka, T. Miller et al., IARPA Janus Benchmark-C: Face Dataset and Protocol, ICB, p.8, 2018.
DOI : 10.1109/icb2018.2018.00033

J. Mccormac, A. Handa, S. Leutenegger, and A. J. Davison, Scenenet RGB-D: can 5m synthetic images beat generic imagenet pre-training on indoor segmentation, IEEE International Conference on Computer Vision, pp.2697-2706, 2017.

K. Minemura, H. Liau, A. Monrroy, and S. Kato, Lmnet: Real-time multiclass object detection on CPU using 3d lidar, 2018.

A. Mishra, S. Nandan-rai, A. Mishra, and C. V. Jawahar, IIIT-CFW: A Benchmark Database of Cartoon Faces in the Wild, 2016.

A. Mishra, K. Alahari, and C. V. Jawahar, Scene text recognition using higher order language priors, British Machine Vision Conference, BMVC 2012, 2012.
DOI : 10.5244/c.26.127
URL : https://hal.archives-ouvertes.fr/hal-00818183

I. Misra, A. Shrivastava, and M. Hebert, Watch and learn: Semi-supervised learning of object detectors from videos, IEEE Conference on Computer Vision and Pattern Recognition, pp.3593-3602, 2015.
DOI : 10.1109/cvpr.2015.7298982
URL : http://arxiv.org/pdf/1505.05769

C. Mitash, K. Wang, K. E. Bekris, and A. Boularias, Physics-aware Self-supervised Training of CNNs for Object Detection, IEEE International Conference on Robotics and Automation (ICRA), 2017.

T. Mitchell, Never-Ending Learning, Commun. ACM, vol.61, issue.5, pp.103-115, 2018.
DOI : 10.1037/e660332010-001

A. Mogelmose, M. M. Trivedi, and T. B. Moeslund, Vision-Based Traffic Sign Detection and Analysis for Intelligent Driver Assistance Systems: Perspectives and Survey, IEEE Transactions on Intelligent Transportation Systems, vol.13, pp.1484-1497, 2012.
DOI : 10.1109/tits.2012.2209421
URL : http://cvrr.ucsd.edu/publications/2012/Mogelmose_ITS2012.pdf

T. Mordan, N. Thome, M. Cord, and G. Henaff, Deformable Part-based Fully Convolutional Network for Object Detection, Proceedings of the British Machine Vision Conference, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01637070

A. Mousavian, D. Anguelov, J. Flynn, and J. Kosecka, 3d bounding box estimation using deep learning and geometry, 2017 IEEE Conference on Computer Vision and Pattern Recognition, vol.2017, pp.5632-5640, 2017.
DOI : 10.1109/cvpr.2017.597
URL : http://arxiv.org/pdf/1612.00496

D. Mrowca, M. Rohrbach, J. Hoffman, R. Hu, K. Saenko et al., Spatial semantic regularisation for large scale object detection, IEEE International Conference on Computer Vision, ICCV 2015, 2003.
DOI : 10.1109/iccv.2015.232
URL : http://arxiv.org/pdf/1510.02949

S. Seongkyu-mun, . Park, H. David-k-han, and . Ko, Generative adversarial network based acoustic scene training set augmentation and selection using svm hyper-plane, Proc. DCASE, pp.93-97, 2017.

G. T-nathan-mundhenk, . Konjevod, A. Wesam, K. Sakla, and . Boakye, A large contextual dataset for classification, detection and counting of cars with deep learning, Computer Vision-ECCV 2016-14th European Conference, pp.785-800, 2016.

H. Nada, A. Vishwanath, H. Sindagi, V. M. Zhang, and . Patel, Pushing the limits of unconstrained face detection: a challenge dataset and baseline results. CoRR, abs/1804.10275, 2018.

M. Najibi, M. Rastegari, and L. S. Davis, An Iterative Grid Based Object Detector, 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016.
DOI : 10.1109/cvpr.2016.260
URL : http://arxiv.org/pdf/1512.07729

M. Najibi, P. Samangouei, R. Chellappa, and L. Davis, SSH: Single Stage Headless Face Detector, IEEE International Conference on Computer Vision, 2017.
DOI : 10.1109/iccv.2017.522
URL : http://arxiv.org/pdf/1708.03979

A. Newell, K. Yang, and J. Deng, Stacked hourglass networks for human pose estimation, Computer Vision-ECCV 2016-14th European Conference, pp.483-499, 2016.
DOI : 10.1007/978-3-319-46484-8_29
URL : http://arxiv.org/pdf/1603.06937

M. Niepert, M. Ahmed, and K. Kutzkov, Learning convolutional neural networks for graphs, International conference on machine learning, pp.2014-2023, 2016.

J. Steven, J. Nowlan, and . Platt, A convolutional neural network hand tracker, Advances in Neural Information Processing Systems 8, NIPS, pp.901-908, 1995.

J. Ogier, D. Terrail, and F. Jurie, ON THE USE OF DEEP NEURAL NETWORKS FOR THE DETECTION OF SMALL VEHICLES IN ORTHOIMAGES, IEEE International Conference on Image Processing, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01527906

K. Oksuz, E. Baris-can-cam, S. Akbas, and . Kalkan, Localization Recall Precision (LRP): A New Performance Metric for Object Detection, Computer Vision-ECCV 2018-15th European Conference, 2018.

M. Oquab, L. Bottou, I. Laptev, and J. Sivic, Weakly supervised object recognition with convolutional neural networks, Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems, 2014.

M. Oquab, L. Bottou, I. Laptev, and J. Sivic, Is object localization for free?-weakly-supervised learning with convolutional neural networks, IEEE Conference on Computer Vision and Pattern Recognition, pp.685-694, 2015.
DOI : 10.1109/cvpr.2015.7298668
URL : https://hal.archives-ouvertes.fr/hal-01015140

M. Osadchy, Y. Le-cun, and M. Miller, Synergistic face detection and pose estimation with energy-based models, Journal of Machine Learning Research, vol.8, pp.1197-1215, 2007.
DOI : 10.1007/11957959_10

W. Ouyang, X. Wang, and C. Zhang, Factors in finetuning deep model for object detection with long-tail distribution, 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016.
DOI : 10.1109/cvpr.2016.100

W. Ouyang and X. Wang, Joint deep learning for pedestrian detection, IEEE International Conference on Computer Vision, ICCV 2013, 2013.
DOI : 10.1109/iccv.2013.257
URL : http://www.ee.cuhk.edu.hk/%7Exgwang/papers/ouyangWiccv13.pdf

W. Ouyang and X. Wang, Single-pedestrian detection aided by multipedestrian detection, 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp.3198-3205, 2013.
DOI : 10.1109/cvpr.2013.411

W. Ouyang, X. Wang, X. Zeng, S. Qiu, P. Luo et al., DeepID-Net: Deformable deep convolutional neural networks for object detection, Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems, 2015.
DOI : 10.1109/cvpr.2015.7298854
URL : http://arxiv.org/pdf/1412.5661

W. Ouyang, K. Wang, X. Zhu, and X. Wang, Learning chained deep features and classifiers for cascade in object detection, 2017.

X. Ouyang, Y. Cheng, Y. Jiang, C. Li, and P. Zhou, Pedestriansynthesis-gan: Generating pedestrian data in real scene and beyond, 2018.

P. Dim, . Papadopoulos, R. R. Jasper, F. Uijlings, V. Keller et al., We don't need no bounding-boxes: Training object class detectors using only human verification, 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016.

P. Dim, . Papadopoulos, R. R. Jasper, F. Uijlings, V. Keller et al., Training object class detectors with click supervision, 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp.180-189, 2017.

C. Papageorgiou and T. Poggio, A trainable system for object detection, International Journal of Computer Vision (IJCV), vol.38, issue.1, pp.15-33, 2000.

B. Peng, W. Tan, Z. Li, S. Zhang, D. Xie et al., Extreme network compression via filter group approximation. CoRR, abs/1807.11254, 2018.

C. Peng, T. Xiao, Z. Li, Y. Jiang, X. Zhang et al., Megdet: A large mini-batch object detector, 2017.

C. Peng, X. Zhang, G. Yu, G. Luo, and J. Sun, Large kernel matters???improve semantic segmentation by global convolutional network, 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp.1743-1751, 2017.
DOI : 10.1109/cvpr.2017.189
URL : http://arxiv.org/pdf/1703.02719

X. Peng and K. Saenko, Synthetic to real adaptation with generative correlation alignment networks, 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018, pp.1982-1991, 2018.
DOI : 10.1109/wacv.2018.00219
URL : http://arxiv.org/pdf/1701.05524

X. Peng, B. Sun, K. Ali, and K. Saenko, Learning Deep Object Detectors from 3D Models, IEEE International Conference on Computer Vision, ICCV 2015, 2015.
DOI : 10.1109/iccv.2015.151
URL : http://arxiv.org/pdf/1412.7122

A. Pentland, B. Moghaddam, and T. Starner, View-based and modular eigenspaces for face recognition, Conference on Computer Vision and Pattern Recognition, pp.84-91, 1994.
DOI : 10.1109/cvpr.1994.323814

B. Pepik, R. Benenson, T. Ritschel, and B. Schiele, What is holding back convnets for detection, German Conference on Pattern Recognition, pp.517-528, 2015.
DOI : 10.1007/978-3-319-24947-6_43
URL : https://doi.org/10.1007/978-3-319-24947-6_43

L. Perez and J. Wang, The effectiveness of data augmentation in image classification using deep learning. CoRR, abs/1712.04621, 2017.

P. Pham, D. Nguyen, T. Do, T. D. Ngo, and D. Le, Evaluation of Deep Models for Real-Time Small Object Detection, ICONIP, vol.10636, pp.516-526, 2017.
DOI : 10.1007/978-3-319-70090-8_53

H. O. Pedro, R. Pinheiro, P. Collobert, and . Dollár, Learning to segment object candidates, Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems, 1990.

P. O. Pinheiro and R. Collobert, From Image-level to Pixel-level Labeling with Convolutional Networks, IEEE Conference on Computer Vision and Pattern Recognition, 2015.
DOI : 10.1109/cvpr.2015.7298780
URL : http://arxiv.org/pdf/1411.6228

T. Pedro-o-pinheiro, R. Lin, P. Collobert, and . Dollár, Learning to refine object segments, Computer Vision-ECCV 2016-14th European Conference, pp.75-91, 2016.

A. D. Pon, O. Andrienko, A. Harakeh, and S. L. Waslander, A Hierarchical Deep Architecture and Mini-Batch Selection Method For Joint Traffic Sign and Light Detection, IEEE Conference on Computer and Robot Vision, 2018.

J. Pont-tuset, P. Arbelaez, J. T. Barron, F. Marques, and J. Malik, Multiscale combinatorial grouping for image segmentation and object proposal generation, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.39, issue.1, pp.128-140, 2017.
DOI : 10.1109/tpami.2016.2537320
URL : http://upcommons.upc.edu/bitstream/2117/105287/1/1503.00848.pdf

F. Murat and P. , Integral histogram: A fast way to extract histograms in cartesian spaces, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005, pp.829-836, 2005.

C. R. Qi, H. Su, K. Mo, and L. J. Guibas, Pointnet: Deep learning on point sets for 3d classification and segmentation, 2017 IEEE Conference on Computer Vision and Pattern Recognition, vol.2017, 2017.

W. Charles-ruizhongtai-qi, C. Liu, H. Wu, L. J. Su, and . Guibas, Frustum pointnets for 3d object detection from RGB-D data, 2017.

L. Charles-ruizhongtai-qi, H. Yi, L. J. Su, . Guibas, M. Hanna et al., Pointnet++: Deep hierarchical feature learning on point sets in a metric space, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, pp.5105-5114, 2017.

W. Qiu and A. L. Yuille, Unrealcv: Connecting computer vision to unreal engine, Computer Vision-ECCV 2016-14th European Conference, vol.9915, pp.909-916, 2016.

S. H. Shafin-rahman, F. Khan, and . Porikli, Zero-shot object detection: Learning to simultaneously recognize and localize novel concepts, 2018.

E. Rahtu, J. Kannala, and M. Blaschko, Learning a category independent object detection cascade, IEEE International Conference on Computer Vision, pp.1052-1059, 2011.
DOI : 10.1109/iccv.2011.6126351
URL : https://hal.archives-ouvertes.fr/hal-00855735

A. Raj, P. Vinay, T. Namboodiri, and . Tuytelaars, Subspace Alignment Based Domain Adaptation for RCNN Detector, Proceedings of the British Machine Vision Conference, vol.11, pp.166-167, 2015.
DOI : 10.5244/c.29.166
URL : http://www.bmva.org/bmvc/2015/papers/paper166/abstract166.pdf

N. Rakesh, E. Rajaram, M. M. Ohn-bar, and . Trivedi, RefineNet: Iterative refinement for accurate object localization, IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), pp.1528-1533, 2016.

S. Param, R. S. Rajpura, H. Hegde, and . Bojinov, Object detection using deep cnns trained on synthetic images. CoRR, abs/1706.06782, 2017.

R. Ranjan, M. Vishal, R. Patel, and . Chellappa, A deep pyramid deformable part model for face detection, IEEE 7th International Conference on Biometrics Theory, Applications and Systems, pp.1-8, 2015.
DOI : 10.1109/btas.2015.7358755
URL : http://arxiv.org/pdf/1508.04389

P. Rantalankila, J. Kannala, and E. Rahtu, Generating object segmentation proposals using global and local search, 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp.2417-2424, 2014.
DOI : 10.1109/cvpr.2014.310

M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, Xnor-net: Imagenet classification using binary convolutional neural networks, Computer Vision-ECCV 2016-14th European Conference, pp.525-542, 2016.
DOI : 10.1007/978-3-319-46493-0_32
URL : http://arxiv.org/pdf/1603.05279

H. Alexander-j-ratner, Z. Ehrenberg, J. Hussain, C. Dunnmon, and . Ré, Learning to compose domain-specific transformations for data augmentation, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, pp.3236-3246, 2017.

S. Kumar, . Ray, K. Vijayan, S. Asari, and . Chakraborty, Object detection by spatio-temporal analysis and tracking of the detected objects in a video with variable background, 2017.

S. Razakarivony and F. Jurie, Vehicle detection in aerial imagery: A small target detection benchmark, Journal of Visual Communication and Image Representation, vol.34, pp.187-203, 2016.
DOI : 10.1016/j.jvcir.2015.11.002
URL : https://hal.archives-ouvertes.fr/hal-01122605

E. Real, J. Shlens, S. Mazzocchi, X. Pan, and V. Vanhoucke, Youtube-boundingboxes: A large high-precision human-annotated data set for object detection in video, 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp.7464-7473, 2017.
DOI : 10.1109/cvpr.2017.789
URL : http://arxiv.org/pdf/1702.00824

S. Sashank-j-reddi, S. Kale, and . Kumar, On the convergence of adam and beyond, International Conference on Learning Representations (ICLR), 2018.

J. Redmon and A. Angelova, Real-time grasp detection using convolutional neural networks, IEEE International Conference on Robotics and Automation (ICRA), 2015.
DOI : 10.1109/icra.2015.7139361
URL : http://arxiv.org/pdf/1412.3128

J. Redmon and A. Farhadi, YOLO9000: better, faster, stronger, 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp.6517-6525, 2017.
DOI : 10.1109/cvpr.2017.690
URL : http://arxiv.org/pdf/1612.08242

J. Redmon and A. Farhadi, Yolov3: An incremental improvement. CoRR, 2018.

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, You only look once: Unified, real-time object detection, 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp.779-788, 2016.
DOI : 10.1109/cvpr.2016.91
URL : http://arxiv.org/pdf/1506.02640

K. Shaoqing-ren, R. He, J. Girshick, and . Sun, Faster r-cnn: Towards real-time object detection with region proposal networks, Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems, pp.91-99, 2015.

K. Shaoqing-ren, R. B. He, X. Girshick, J. Zhang, and . Sun, Object detection networks on convolutional feature maps, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.39, issue.7, pp.1476-1481, 2017.

M. Rochan and Y. Wang, Weakly supervised localization of novel objects using appearance transfer, IEEE Conference on Computer Vision and Pattern Recognition, 2015.
DOI : 10.1109/cvpr.2015.7299060
URL : http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Rochan_Weakly_Supervised_Localization_2015_CVPR_paper.pdf

M. Rodriguez, I. Laptev, J. Sivic, and J. Audibert, Densityaware person detection and tracking in crowds, IEEE International Conference on Computer Vision, ICCV 2011, pp.2423-2430, 2011.
DOI : 10.1109/iccv.2011.6126526
URL : https://hal.archives-ouvertes.fr/hal-00654266

S. Romberg, L. Garcia-pueyo, R. Lienhart, and R. Van-zwol, Scalable logo recognition in real-world images, Proceedings of the 1st ACM International Conference on Multimedia Retrieval, p.25, 2011.
DOI : 10.1145/1991996.1992021

A. Rosenfeld, R. Zemel, and J. K. Tsotsos, The elephant in the room, CoRR, 2018.

M. Rasmus-rothe, L. Guillaumin, and . Van-gool, Non-maximum suppression for object detection by passing messages between windows, Computer Vision-ACCV 2014-12th Asian Conference on Computer Vision, pp.290-306, 2014.

S. Roy, P. Vinay, A. Namboodiri, and . Biswas, Active learning with version spaces for object detection. CoRR, abs/1611.07285, 2016.

S. Rujikietgumjorn, . Robert, and . Collins, Optimized pedestrian detection for multiple and occluded people, 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp.3690-3697, 2013.
DOI : 10.1109/cvpr.2013.473
URL : http://www.cse.psu.edu/%7Ercollins/Papers/cvpr2013SitapaCollins.pdf

G. E. David-e-rumelhart, R. Hinton, and . Williams, Learning internal representations by error propagation, 1985.

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh et al., ImageNet Large Scale Visual Recognition Challenge, International Journal of Computer Vision (IJCV), vol.115, issue.3, pp.211-252, 2015.
DOI : 10.1007/s11263-015-0816-y
URL : http://dspace.mit.edu/bitstream/1721.1/104944/1/11263_2015_Article_816.pdf

P. Sabzmeydani and G. Mori, Detecting pedestrians by learning shapelet features, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007), pp.18-23, 2007.
DOI : 10.1109/cvpr.2007.383134
URL : http://www.cs.sfu.ca/%7Emori/research/papers/sabzmeydani_shapelet_cvpr07.pdf

M. Sadeghi and A. Farhadi, Recognition using visual phrases, The 24th IEEE Conference on Computer Vision and Pattern Recognition, pp.1745-1752, 2011.
DOI : 10.1109/cvpr.2011.5995711
URL : http://www.cs.rit.edu/%7Erlc/Courses/ImageUnderstanding/Papers/Current/visualPhrases.pdf

M. Sadeghi and D. A. Forsyth, 30hz object detection with DPM V5, Computer Vision-ECCV 2014-13th European Conference, vol.8689, pp.65-79, 2014.
DOI : 10.1007/978-3-319-10590-1_5

A. Wesam, G. Sakla, T. Konjevod, and . Nathan-mundhenk, Deep multimodal vehicle detection in aerial ISR imagery, 2017 IEEE Winter Conference on Applications of Computer Vision, pp.916-923, 2017.

M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L. Chen, Computer Vision and Pattern Recognition (CVPR), 2018 IEEE Conference on, pp.4510-4520, 2018.

P. A. Savalle and S. Tsogkas, Deformable part models with cnn features, SAICSIT Conf, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01109290

H. Schneiderman and T. Kanade, Object detection using the statistics of parts, International Journal of Computer Vision (IJCV), vol.56, issue.3, pp.151-177, 2004.
DOI : 10.1023/b:visi.0000011202.85607.00
URL : http://www.cs.cmu.edu/~efros/courses/AP06/Papers/schneiderman-ijcv-04.pdf

P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus et al., Overfeat: Integrated recognition, localization and detection using convolutional networks, 2013.

P. Sermanet, K. Kavukcuoglu, S. Chintala, and Y. Lecun, Pedestrian detection with unsupervised multi-stage feature learning, 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp.3626-3633, 2013.
DOI : 10.1109/cvpr.2013.465
URL : http://arxiv.org/pdf/1212.0142

B. Mohammad-javad-shafiee, F. Chywl, A. Li, and . Wong, Fast YOLO: A fast you only look once system for real-time embedded object detection in video, 2017.

Y. Shen, R. Ji, S. Zhang, W. Zuo, and Y. Wang, Generative adversarial learning towards fast weakly supervised detection, Computer Vision and Pattern Recognition (CVPR), 2018 IEEE Conference on, 2018.

Z. Shen, Z. Liu, J. Li, Y. Jiang, Y. Chen et al., Dsod: Learning deeply supervised object detectors from scratch, IEEE International Conference on Computer Vision, vol.3, 2017.
DOI : 10.1109/iccv.2017.212
URL : http://arxiv.org/pdf/1708.01241

Z. Shen, H. Shi, R. Schmidt-feris, L. Cao, S. Yan et al., Learning object detectors from scratch with gated recurrent feature pyramids
DOI : 10.1109/iccv.2017.212
URL : http://arxiv.org/pdf/1708.01241

. Corr, , 2017.

B. Shi, X. Bai, and S. J. Belongie, Detecting oriented text in natural images by linking segments, 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp.3482-3490, 2017.

B. Shi, C. Yao, M. Liao, M. Yang, P. Xu et al., , 2017.

X. Shi, S. Shan, M. Kan, S. Wu, and X. Chen, Realtime rotation-invariant face detection with progressive calibration networks, Computer Vision and Pattern Recognition (CVPR), 2018 IEEE Conference on, 2018.

K. Shmelkov, C. Schmid, and K. Alahari, Incremental learning of object detectors without catastrophic forgetting, IEEE International Conference on Computer Vision, pp.3420-3429, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01573623

A. Shrivastava, A. Gupta, and R. Girshick, Training region-based object detectors with online hard example mining, 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp.761-769, 2016.
DOI : 10.1109/cvpr.2016.89
URL : http://arxiv.org/pdf/1604.03540

A. Shrivastava, R. Sukthankar, J. Malik, and A. Gupta, Beyond skip connections: Top-down modulation for object detection, 2016.

A. Shrivastava, T. Pfister, O. Tuzel, J. Susskind, W. Wang et al., Learning from Simulated and Unsupervised Images through Adversarial Training, IEEE Conference on Computer Vision and Pattern Recognition, pp.2242-2251, 2017.
DOI : 10.1109/cvpr.2017.241
URL : http://arxiv.org/pdf/1612.07828

S. Silberstein, D. Levi, V. Kogan, and R. Gazit, Vision-based pedestrian detection for rear-view cameras, Intelligent Vehicles Symposium Proceedings, pp.853-860, 2014.
DOI : 10.1109/ivs.2014.6856399

L. Daniel, Q. Silver, L. Yang, and . Li, Lifelong Machine Learning Systems: Beyond Learning Algorithms, 2013 AAAI Spring Symposium, p.7, 2013.

M. Simon, S. Milz, K. Amende, and H. Gross, Complexyolo: Real-time 3d object detection on point clouds, 2018.

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014.

K. Simonyan, A. Vedaldi, and A. Zisserman, Deep inside convolutional networks: Visualising image classification models and saliency maps. CoRR, abs/1312, vol.6034, 2013.

B. Singh, . Larry, and . Davis, An analysis of scale invariance in object detection-snip, 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017.

B. Singh, H. Li, A. Sharma, and L. S. Davis, R-FCN-3000 at 30fps: Decoupling detection and classification, 2017.

B. Singh, M. Najibi, and L. S. Davis, SNIPER: efficient multi-scale training, 2018.

L. Sixt, B. Wild, and T. Landgraf, Rendergan: Generating realistic labeled data, Front. Robotics and AI, 2018.
DOI : 10.3389/frobt.2018.00066
URL : https://www.frontiersin.org/articles/10.3389/frobt.2018.00066/pdf

A. Arnold-w-m-smeulders, R. Gupta, and . Jain, Content-Based Image Retrieval at the End of the Early Years, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.22, issue.12, p.32, 2000.

L. W. Sommer, T. Schuchert, J. Beyerer, A. Firooz, A. Sadjadi et al., Deep learning based multi-category object detection in aerial images, SPIE Defense+ Security, 2017.
DOI : 10.1117/12.2262083

L. W. Sommer, T. Schuchert, and J. Beyerer, Fast deep vehicle detection in aerial images, 2017 IEEE Winter Conference on Applications of Computer Vision, pp.311-319, 2017.
DOI : 10.1109/wacv.2017.41

L. W. Sommer, A. Schumann, T. Schuchert, and J. Beyerer, Multi feature deconvolutional faster R-CNN for precise vehicle detection in aerial imagery, 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018, pp.635-642, 2018.
DOI : 10.1109/wacv.2018.00075

H. Song, R. B. Girshick, S. Jegelka, J. Mairal, Z. Harchaoui et al., On learning to localize objects with minimal supervision, Proceedings of the 31th International Conference on Machine Learning, vol.32, pp.1611-1619, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00996849

H. Song, Y. J. Lee, S. Jegelka, and T. Darrell, Weaklysupervised discovery of visual pattern configurations, Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems, pp.1637-1645, 2014.

J. Tobias-springenberg, A. Dosovitskiy, T. Brox, and M. A. Riedmiller, Striving for simplicity: The all convolutional net. CoRR, abs/1412, vol.6806, 2014.

S. Srivastava, G. Sharma, and B. Lall, Large scale novel object discovery in 3d, 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018, pp.179-188, 2018.
DOI : 10.1109/wacv.2018.00026
URL : http://arxiv.org/pdf/1701.07046

R. Stewart, M. Andriluka, and A. Ng, End-to-end people detection in crowded scenes, 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, pp.2325-2333, 2016.
DOI : 10.1109/cvpr.2016.255
URL : http://arxiv.org/pdf/1506.04878

H. Su, S. Gong, and X. Zhu, WebLogo-2M: Scalable Logo Detection by Deep Learning from the Web, ICCB Workshops, pp.270-279, 2017.
DOI : 10.1109/iccvw.2017.41

H. Su, X. Zhu, and S. Gong, Deep Learning Logo Detection with Data Expansion by Synthesising Context, IEEE Winter Conf. on Applications of Computer Vision (WACV), pp.530-539, 2017.
DOI : 10.1109/wacv.2017.65
URL : http://arxiv.org/pdf/1612.09322

H. Su, X. Zhu, and S. Gong, Open Logo Detection Challenge, Proceedings of the British Machine Vision Conference, 2018.

B. Sun and K. Saenko, From virtual to reality: Fast adaptation of virtual object detectors to real domains, British Machine Vision Conference, vol.1, p.3, 2014.
DOI : 10.5244/c.28.82
URL : http://www.bmva.org/bmvc/2014/files/abstract062.pdf

C. Sun, M. Paluri, R. Collobert, R. Nevatia, and L. Bourdev, ProNet: Learning to Propose Object-Specific Boxes for Cascaded Neural Networks, 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, 2016.
DOI : 10.1109/cvpr.2016.379
URL : http://arxiv.org/pdf/1511.03776

C. Szegedy, S. E. Reed, D. Erhan, and D. Anguelov, Scalable, high-quality object detection. CoRR, abs/1412.1441, 2014.

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed et al., Going deeper with convolutions, IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, pp.1-9, 2015.
DOI : 10.1109/cvpr.2015.7298594
URL : http://arxiv.org/pdf/1409.4842

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, Rethinking the inception architecture for computer vision, 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, pp.2818-2826, 2016.
DOI : 10.1109/cvpr.2016.308
URL : http://arxiv.org/pdf/1512.00567

C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi, Inception-v4, inception-resnet and the impact of residual connections on learning, AAAI, vol.4, p.12, 2017.

M. Tan, B. Chen, R. Pang, V. Vasudevan, Q. V. Le et al., Platform-aware neural architecture search for mobile. CoRR, abs/1807.11626, 2018.

K. D. Tang, V. Ramanathan, F. Li, and D. Koller, Shifting Weights: Adapting Object Detectors from Image to Video, Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held, 2012.

P. Tang, X. Wang, X. Bai, and W. Liu, Multiple instance detection network with online instance classifier refinement, 2017 IEEE Conference on Computer Vision and Pattern Recognition, vol.2017, 2017.
DOI : 10.1109/cvpr.2017.326
URL : http://arxiv.org/pdf/1704.00138

S. Tang, M. Andriluka, and B. Schiele, Detection and tracking of occluded people, International Journal of Computer Vision (IJCV), vol.110, issue.1, pp.58-69, 2014.
DOI : 10.5244/c.26.9
URL : http://www.bmva.org/bmvc/2012/BMVC/paper009/paper009.pdf

S. Tang, B. Andres, M. Andriluka, and B. Schiele, Subgraph decomposition for multi-target tracking, IEEE Conference on Computer Vision and Pattern Recognition, pp.5033-5041, 2015.
DOI : 10.1109/cvpr.2015.7299138

T. Tang, S. Zhou, Z. Deng, L. Lei, and H. Zou, ArbitraryOriented Vehicle Detection in Aerial Imagery with Single Convolutional Neural Networks, Remote Sensing, vol.9, pp.1170-1187, 2017.

T. Tang, S. Zhou, Z. Deng, H. Zou, and L. Lei, Vehicle Detection in Aerial Images Based on Region Convolutional Neural Networks and Hard Negative Example Mining, Sensors, vol.17, pp.336-353, 2017.
DOI : 10.3390/s17020336
URL : http://www.mdpi.com/1424-8220/17/2/336/pdf

Y. Tang, J. K. Wang, B. Gao, and E. Dellandréa, Large Scale Semi-supervised Object Detection using Visual and Semantic Knowledge Transfer, 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016.
DOI : 10.1109/cvpr.2016.233
URL : https://hal.archives-ouvertes.fr/hal-01488579

F. Tanner, B. Colder, C. Pullen, D. Heagy, M. Eppolito et al., Overhead imagery research data set???an annotated data library & tools to aid in the development of computer vision algorithms, 2009 IEEE Applied Imagery Pattern Recognition Workshop (AIPR 2009), pp.1-8, 2009.

L. Taylor and G. Nitschke, Improving deep learning using generic data augmentation, 2017.

Y. Tian, X. Li, K. Wang, and F. Wang, Training and testing object detectors with virtual images, 2017.

T. Tieleman and G. Hinton, Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning, vol.4, pp.26-31, 2012.

R. Timofte, K. Zimmermann, and L. Van-gool, Multi-view traffic sign detection, recognition, and 3d localisation. Machine vision and applications, vol.25, pp.633-647, 2014.

T. Tommasi, N. Patricia, B. Caputo, and T. Tuytelaars, Domain Adaptation in Computer Vision Applications, Advances in Computer Vision and Pattern Recognition, pp.37-55, 2017.

A. Torralba and A. A. Efros, Unbiased look at dataset bias, The 24th IEEE Conference on Computer Vision and Pattern Recognition, pp.1521-1528, 2011.

T. Tran, T. Pham, G. Carneiro, L. Palmer, and I. Reid, A bayesian data augmentation approach for learning deep models, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, pp.2797-2806, 2017.

J. Tremblay, A. Prakash, D. Acuna, and M. Brophy, Varun Jampani, Cem Anil, Thang To, Eric Cameracci, Shaad Boochoon, and Stan Birchfield. Training deep networks with synthetic data: Bridging the reality gap by domain randomization, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2018.

J. Tremblay, T. To, and S. Birchfield, Falling things: A synthetic dataset for 3d object detection and pose estimation. CoRR, abs/1804.06534, 2018.

S. Tripathi, Z. C. Lipton, S. J. Belongie, and T. Q. Nguyen, Context matters: Refining object detection in video with recurrent neural networks, Proceedings of the British Machine Vision Conference, 2016.

Z. Tu and X. Bai, Auto-context and its application to high-level vision tasks and 3d brain image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.32, issue.10, pp.1744-1757, 2010.

Z. Tu, Y. Ma, W. Liu, X. Bai, and C. Yao, Detecting texts of arbitrary orientations in natural images, 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp.1083-1090, 2012.

O. Tuzel, F. Porikli, and P. Meer, Pedestrian detection via classification on riemannian manifolds, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.30, issue.10, pp.1713-1727, 2008.

A. Tüzkö, C. Herrmann, D. Manger, and J. Beyerer, Open set logo detection and retrieval, Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, vol.5, pp.284-292, 2018.

. Jasper-rr-uijlings, E. A. Koen, T. Van-de-sande, A. Gevers, and . Smeulders, Selective search for object recognition, International Journal of Computer Vision (IJCV), vol.104, issue.2, pp.154-171, 2013.

R. Vaillant, C. Monrocq, and Y. Le-cun, Original approach for the localisation of objects in images, IEE Proceedings-Vision, Image and Signal Processing, vol.141, pp.245-250, 1994.

E. A. Koen, . Van-de-sande, R. R. Jasper, T. Uijlings, A. Gevers et al., Segmentation as selective search for object recognition, IEEE International Conference on Computer Vision, pp.1879-1886, 2011.

G. Van-horn, O. M. Aodha, Y. Song, Y. Cui, C. Sun et al., The iNaturalist Species Classification and Detection Dataset, Computer Vision and Pattern Recognition (CVPR), 2018 IEEE Conference on, 2018.

G. Varol, J. Romero, X. Martin, N. Mahmood, M. J. Black et al., Learning from synthetic humans, 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp.4627-4635, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01505711

A. Veit, T. Matera, L. Neumann, J. Matas, and S. J. , Belongie. Coco-text: Dataset and benchmark for text detection and recognition in natural images, 2016.

A. Vezhnevets and V. Ferrari, Object localization in imagenet by looking out of the window, Proceedings of the British Machine Vision Conference, vol.12, pp.27-28, 2015.

P. A. Viola, M. J. Jones, and D. Snow, Detecting pedestrians using patterns of motion and appearance, International Journal of Computer Vision (IJCV), vol.63, issue.2, pp.153-161, 2005.
DOI : 10.1109/iccv.2003.1238422
URL : http://www.merl.com/papers/docs/TR2003-90.pdf

S. Walk, N. Majer, K. Schindler, and B. Schiele, New features and insights for pedestrian detection, The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, pp.1030-1037, 2010.
DOI : 10.1109/cvpr.2010.5540102
URL : http://www.jdl.ac.cn/project/faceId/paperreading/Paper/hyren_20100507.pdf

P. Fang-wan, J. Wei, Z. Jiao, Q. Han, and . Ye, Minentropy latent model for weakly supervised object detection, Computer Vision and Pattern Recognition (CVPR), 2018 IEEE Conference on, 2018.

L. Wan, D. Eigen, and R. Fergus, End-to-end integration of a convolutional network, deformable parts model and non-maximum suppression, IEEE Conference on Computer Vision and Pattern Recognition, pp.851-859, 2015.

C. Wang, W. Ren, K. Huang, and T. Tan, Weakly Supervised Object Localization with Latent Category Learning, Computer Vision-ECCV 2014-13th European Conference, 2014.
DOI : 10.1007/978-3-319-10599-4_28

K. Wang and S. Belongie, Word spotting in the wild, Computer Vision-ECCV 2010, 11th European Conference on Computer Vision, pp.591-604, 2010.
DOI : 10.1007/978-3-642-15549-9_43

L. Wang, Y. Lu, H. Wang, Y. Zheng, H. Ye et al., Evolving boxes for fast vehicle detection, ICME, pp.1135-1140, 2017.
DOI : 10.1109/icme.2017.8019461
URL : http://arxiv.org/pdf/1702.00254

R. J. Wang, X. Li, S. Ao, and C. X. Ling, Pelee: A Real-Time Object Detection System on Mobile Devices, International Conference on Learning Representations (ICLR), 2018.

X. Wang, R. B. Girshick, A. Gupta, and K. He, Non-local neural networks. CoRR, 2017.

X. Wang, A. Shrivastava, and A. Gupta, A-fast-rcnn: Hard positive generation via adversary for object detection, 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp.3039-3048, 2017.
DOI : 10.1109/cvpr.2017.324
URL : http://arxiv.org/pdf/1704.03414

X. Wang, T. X. Han, and S. Yan, An HOG-LBP human detector with partial occlusion handling, IEEE 12th International Conference on Computer Vision, pp.32-39, 2009.
DOI : 10.1109/iccv.2009.5459207

X. Wang, T. Xiao, Y. Jiang, S. Shao, J. Sun et al., Repulsion Loss: Detecting Pedestrians in a Crowd, Computer Vision and Pattern Recognition (CVPR), 2018 IEEE Conference on, 2018.

M. Weiler, F. A. Hamprecht, and M. Storath, Learning steerable filters for rotation equivariant cnns, Computer Vision and Pattern Recognition (CVPR), 2018 IEEE Conference on, 2018.

L. Wen, D. Du, Z. Cai, Z. Lei, M. Chang et al., DETRAC: A new benchmark and protocol for multi-object tracking, 2015.

C. Whitelam, E. Taborsky, A. Blanton, B. Maze, J. Adams et al., Iarpa janus benchmark-b face dataset, CVPR Workshop on Biometrics, 2017.
DOI : 10.1109/cvprw.2017.87

C. Wojek, G. Dorkó, A. Schulz, and B. Schiele, Slidingwindows for rapid object class localization: A parallel technique, Joint Pattern Recognition Symposium, pp.71-81, 2008.
DOI : 10.1007/978-3-540-69321-5_8

C. Wojek, S. Walk, and B. Schiele, Multi-cue onboard pedestrian detection, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.794-801, 2009.
DOI : 10.1109/cvpr.2009.5206638

S. Woo, S. Hwang, and I. S. Kweon, Stairnet: Top-down semantic aggregation for accurate one shot detection, 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018, pp.1093-1102, 2018.
DOI : 10.1109/wacv.2018.00125
URL : http://arxiv.org/pdf/1709.05788

B. Wu, F. N. Iandola, P. H. Jin, and K. Keutzer, Squeezedet: Unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops, pp.446-454, 2017.
DOI : 10.1109/cvprw.2017.60
URL : http://arxiv.org/pdf/1612.01051

B. Wu and R. Nevatia, Cluster boosted tree classifier for multi-view, multipose object detection, IEEE 11th International Conference on Computer Vision, pp.1-8, 2007.
DOI : 10.1109/iccv.2007.4409006
URL : http://iris.usc.edu/Outlines/papers/2007/wu-nev-iccv07.pdf

B. Wu and R. Nevatia, Detection of multiple, partially occluded humans in a single image by bayesian combination of edgelet part detectors, 10th IEEE International Conference on Computer Vision (ICCV 2005, pp.90-97, 2005.

T. Wu, B. Li, and S. Zhu, Learning and-or model to represent context and occlusion for car detection and viewpoint estimation, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.38, pp.1829-1843, 2016.

Y. Wu and Q. Ji, Facial Landmark Detection: A Literature Survey. International Journal of Computer Vision (IJCV), To appear, 2018.

G. Xia, X. Bai, J. Ding, Z. Zhu, S. J. Belongie et al., DOTA: A large-scale dataset for object detection in aerial images. CoRR, abs/1711.10398, 2017.

Y. Xiang and S. Savarese, Estimating the aspect layout of object categories, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.

Y. Xiang, W. Choi, Y. Lin, and S. Savarese, Data-driven 3d voxel patterns for object category recognition, IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, pp.1903-1911, 2015.

Y. Xiao, C. Lu, E. Tsougenis, Y. Lu, and C. Tang, Complexity-adaptive distance metric for object proposals generation, IEEE Conference on Computer Vision and Pattern Recognition, 2015.

S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, Aggregated residual transformations for deep neural networks, 2017 IEEE Conference on Computer Vision and Pattern Recognition, vol.2017, pp.5987-5995, 2017.

H. Xu, X. Lv, X. Wang, R. Zhou-ren, and . Chellappa, Deep regionlets for object detection. CoRR, abs/1712.02408, 2017.

J. Xu, S. Ramos, D. Vázquez, and A. López, Domain adaptation of deformable part-based models, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.36, issue.12, pp.2367-2380, 2014.

Z. Xu, X. Xu, L. Wang, R. Yang, and F. Pu, Deformable ConvNet with Aspect Ratio Constrained NMS for Object Detection in Remote Sensing Imagery. Remote Sensing, vol.9, pp.1312-1331, 2017.

J. Yan, X. Zhang, Z. Lei, and S. Z. Li, Face detection by structural models, Image and Vision Computing, vol.32, issue.10, pp.790-799, 2014.

F. Yang, W. Choi, and Y. Lin, Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers, 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, pp.2129-2137, 2016.

S. Yang, P. Luo, C. C. Loy, and X. Tang, From facial parts responses to face detection: A deep learning approach, 2015 IEEE International Conference on Computer Vision, ICCV 2015, pp.3676-3684, 2015.

S. Yang, P. Luo, C. Loy, and X. Tang, Wider face: A face detection benchmark, 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp.5525-5533, 2016.

Z. Yang and R. Nevatia, A multi-scale cascade fully convolutional network face detector, 23rd International Conference on Pattern Recognition, pp.633-638, 2016.

C. Yao, X. Bai, N. Sang, X. Zhou, S. Zhou et al., Scene text detection via holistic, multi-channel prediction, 2016.

R. Yoshihashi, T. T. Trinh, R. Kawakami, S. You, M. Iida et al., Learning multi-frame visual representation for joint detection and tracking of small objects, 2017.

Y. You, Z. Zhang, C. Hsieh, J. Demmel, and K. Keutzer, Imagenet training in minutes, Proceedings of the 47th International Conference on Parallel Processing, vol.1, pp.1-1, 2018.

F. Yu and V. Koltun, Multi-scale context aggregation by dilated convolutions, 2015.

F. Yu, V. Koltun, and T. A. Funkhouser, Dilated residual networks, 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp.636-644, 2017.
DOI : 10.1109/cvpr.2017.75
URL : http://arxiv.org/pdf/1705.09914

F. Yu, W. Xian, Y. Chen, F. Liu, M. Liao et al., BDD100K: A diverse driving video database with scalable annotation tooling, 2018.

J. Yu, Y. Jiang, Z. Wang, Z. Cao, and T. S. Huang, Unitbox: An advanced object detection network, Proceedings of the 2016 ACM Conference on Multimedia Conference, MM 2016, pp.516-520, 2016.

R. Yu, X. Chen, V. I. Morariu, and L. S. Davis, The Role of Context Selection in Object Detection, Proceedings of the British Machine Vision Conference, 2016.

Y. Yuan, X. Liang, X. Wang, D. Yeung, and A. Gupta, Temporal dynamic graph lstm for action-driven video object detection, IEEE International Conference on Computer Vision, 2017.
DOI : 10.1109/iccv.2017.200
URL : http://arxiv.org/pdf/1708.00666

Y. Mehmet-kerim-yucel, O. Can-bilge, N. Oguz, P. Ikizler-cinbis, R. Duygulu et al., Wildest faces: Face detection and recognition in violent settings, 2018.

S. Zagoruyko and N. Komodakis, Wide residual networks, Proceedings of the British Machine Vision Conference, 2016.
DOI : 10.5244/c.30.87
URL : https://hal.archives-ouvertes.fr/hal-01832503

S. Zagoruyko, A. Lerer, T. Lin, P. O. Pinheiro, S. Gross et al., A multipath network for object detection
DOI : 10.5244/c.30.15
URL : http://www.bmva.org/bmvc/2016/papers/paper015/abstract015.pdf

. Smith, Proceedings of the British Machine Vision Conference, 2016.

M. D. Zeiler, ADADELTA: an adaptive learning rate method. CoRR, abs/1212, vol.5701, 2012.

D. Matthew, R. Zeiler, and . Fergus, Visualizing and understanding convolutional networks, Computer Vision-ECCV 2014-13th European Conference, pp.818-833, 2014.

X. Zeng, W. Ouyang, B. Yang, J. Yan, and X. Wang, Gated Bi-directional CNN for Object Detection, Computer Vision-ECCV 2016-14th European Conference, 2016.
DOI : 10.1007/978-3-319-46478-7_22

X. Zeng, W. Ouyang, J. Yan, H. Li, T. Xiao et al., Crafting gbd-net for object detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.

Y. Zhai, J. Fu, Y. Lu, and H. Li, Feature selective networks for object detection, Computer Vision and Pattern Recognition (CVPR), 2018 IEEE Conference on, 2018.

C. Zhang and Z. Zhang, A survey of recent advances in face detection, 2010.

D. Zhang, J. Yang, D. Ye, and G. Hua, Lq-nets: Learned quantization for highly accurate and compact deep neural networks

. Corr, , 2018.

L. Zhang, L. Lin, X. Liang, and K. He, Is faster R-CNN doing well for pedestrian detection?, Computer Vision-ECCV 2016-14th European Conference, vol.9906, pp.443-457, 2016.

S. Zhang, R. Benenson, and B. Schiele, Citypersons: A diverse dataset for pedestrian detection, 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp.4457-4465, 2017.

S. Zhang, J. Yang, and B. Schiele, Occluded Pedestrian Detection Through Guided Attention in CNNs, Computer Vision and Pattern Recognition (CVPR), 2018 IEEE Conference on, p.9, 2018.

S. Zhang, X. Zhu, Z. Lei, H. Shi, X. Wang et al., S$ ? 3$FD: Single Shot Scale-invariant Face Detector, IEEE International Conference on Computer Vision, 2017.
DOI : 10.1109/iccv.2017.30
URL : http://arxiv.org/pdf/1708.05237

S. Zhang, L. Wen, X. Bian, Z. Lei, and S. Z. Li, Occlusionaware R-CNN: detecting pedestrians in a crowd, 2018.

S. Zhang, L. Wen, X. Bian, Z. Lei, and S. Z. Li, Singleshot refinement neural network for object detection, Computer Vision and Pattern Recognition (CVPR), 2018 IEEE Conference on, 2018.

X. Zhang, X. Zhou, M. Lin, and J. Sun, Shufflenet: An extremely efficient convolutional neural network for mobile devices, 2017.

X. Zhang, Y. Wei, J. Feng, Y. Yang, and T. S. Huang, Adversarial complementary learning for weakly supervised object localization, Computer Vision and Pattern Recognition (CVPR), 2018 IEEE Conference on, 2018.

X. Zhang, J. Feng, H. Xiong, and Q. Tian, Zigzag learning for weakly supervised object detection, Computer Vision and Pattern Recognition (CVPR), 2018 IEEE Conference on, 2018.

Y. Zhang, Y. Bai, M. Ding, Y. Li, and B. Ghanem, W2f: A weakly-supervised to fully-supervised framework for object detection, Computer Vision and Pattern Recognition (CVPR), 2018 IEEE Conference on, 2018.

Y. Zhang, K. Sohn, R. Villegas, G. Pan, and H. Lee, Improving object detection with deep convolutional networks via Bayesian optimization and structured prediction, IEEE Conference on Computer Vision and Pattern Recognition, 2015.
DOI : 10.1109/cvpr.2015.7298621
URL : http://arxiv.org/pdf/1504.03293

Z. Zhang, C. Zhang, W. Shen, C. Yao, W. Liu et al., Multi-oriented text detection with fully convolutional networks, 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp.4159-4167, 2016.
DOI : 10.1109/cvpr.2016.451
URL : http://arxiv.org/pdf/1604.04018

Z. Zhang, S. Qiao, C. Xie, W. Shen, B. Wang et al., Single-shot object detection with enriched semantics, Computer Vision and Pattern Recognition (CVPR), 2018 IEEE Conference on, 2018.

F. Zhao, Y. Yang, H. Zhang, L. Yang, and L. Zhang, Sign text detection in street view images using an integrated feature. Multimedia Tools and Applications, 2018.
DOI : 10.1007/s11042-018-5975-8

X. Zhao, S. Liang, and Y. Wei, Pseudo mask augmented object detection, Computer Vision and Pattern Recognition (CVPR), 2018 IEEE Conference on, 2018.

Z. Zhao, P. Zheng, S. Xu, and X. Wu, Object detection with deep learning: A review. CoRR, abs/1807.05511, 2018.

L. Zheng, C. Fu, and Y. Zhao, Extend the shallow part of single shot multibox detector via convolutional neural network, 2018.

B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, Object detectors emerge in deep scene cnns, IEEE Conference on Computer Vision and Pattern Recognition, 2015.

B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva, Learning deep features for scene recognition using places database, Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems, 2014.
DOI : 10.1109/tpami.2017.2723009

B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, Learning deep features for discriminative localization, 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, pp.2921-2929, 2016.
DOI : 10.1109/cvpr.2016.319
URL : http://arxiv.org/pdf/1512.04150

P. Zhou, B. Ni, C. Geng, J. Hu, and Y. Xu, ScaleTransferrable Object Detection, Computer Vision and Pattern Recognition (CVPR), 2018 IEEE Conference on, p.10, 2018.

S. Zhou, Z. Ni, X. Zhou, H. Wen, Y. Wu et al., Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients, 2016.

X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou et al., East: An efficient and accurate scene text detector, 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017.
DOI : 10.1109/cvpr.2017.283
URL : http://arxiv.org/pdf/1704.03155

Y. Zhou and O. Tuzel, Voxelnet: End-to-end learning for point cloud based 3d object detection. CoRR, abs/1711.06396, 2017.

H. Zhu, X. Chen, W. Dai, K. Fu, Q. Ye et al., Orientation robust object detection in aerial images using deep convolutional neural network, IEEE International Conference on, pp.3735-3739, 2015.
DOI : 10.1109/icip.2015.7351502

J. Zhu, T. Park, P. Isola, and A. A. Efros, Unpaired imageto-image translation using cycle-consistent adversarial networks, IEEE International Conference on Computer Vision, pp.2242-2251, 2017.
DOI : 10.1109/iccv.2017.244
URL : http://arxiv.org/pdf/1703.10593

P. Zhu, L. Wen, X. Bian, H. Ling, and Q. Hu, Vision meets drones: A challenge. CoRR, abs/1804.07437, 2018.

P. Zhu, H. Wang, T. Bolukbasi, and V. Saligrama, Zeroshot detection, 2018.

X. Zhu and D. Ramanan, Face detection, pose estimation, and landmark localization in the wild, 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp.2879-2886, 2012.

X. Zhu, Y. Wang, J. Dai, L. Yuan, and Y. Wei, Flow-guided feature aggregation for video object detection, IEEE International Conference on Computer Vision, pp.408-417, 2017.
DOI : 10.1109/iccv.2017.52
URL : http://arxiv.org/pdf/1703.10025

X. Zhu, Y. Xiong, J. Dai, L. Yuan, and Y. Wei, Deep feature flow for video recognition, 2017 IEEE Conference on Computer Vision and Pattern Recognition, vol.2, p.7, 2017.
DOI : 10.1109/cvpr.2017.441
URL : http://arxiv.org/pdf/1611.07715

X. Zhu, J. Dai, X. Zhu, Y. Wei, and L. Yuan, Towards high performance video object detection for mobiles, 2018.

Y. Zhu, R. Urtasun, R. Salakhutdinov, and S. Fidler, segDeepM: Exploiting segmentation and context in deep neural networks for object detection, IEEE Conference on Computer Vision and Pattern Recognition, 2015.

Z. Zhu, D. Liang, S. Zhang, X. Huang, B. Li et al., Traffic-sign detection and classification in the wild, 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp.2110-2118, 2016.
DOI : 10.1109/cvpr.2016.232

C. L. Zitnick and P. Dollar, Edge boxes: Locating object proposals from edges, Computer Vision-ECCV 2014-13th European Conference, 2014.
DOI : 10.1007/978-3-319-10602-1_26
URL : http://research.microsoft.com/en-us/um/people/larryz/ZitnickDollarECCV14edgeBoxes.pdf

Z. Zuo, B. Shuai, G. Wang-0012, X. Liu, X. Wang et al., Learning Contextual Dependence With Convolutional Hierarchical Recurrent Neural Networks, IEEE Transactions on Image Processing, 2016.
DOI : 10.1109/tip.2016.2548241
URL : http://arxiv.org/pdf/1509.03877