N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: A simple way to prevent neural networks from overfitting, The Journal of Machine Learning Research, vol.15, issue.1, pp.1929-1958, 2014.

M. Everingham, L. Van-gool, C. K. Williams, J. Winn, and A. Zisserman, The PASCAL visual object classes (VOC) challenge, International Journal of Computer Vision, vol.88, issue.2, pp.303-338, 2010.
DOI : 10.1007/s11263-009-0275-4

URL : http://www.dai.ed.ac.uk/homes/ckiw/postscript/ijcv_voc09.pdf

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh et al., ImageNet large scale visual recognition challenge, Proceedings of the International Conference on Computer Vision (ICCV), 2015.
DOI : 10.1007/s11263-015-0816-y

URL : http://arxiv.org/pdf/1409.0575

Y. Lecun, The mnist database of handwritten digits, 1998.

D. Dwibedi, I. Misra, and M. Hebert, Cut, paste and learn: Surprisingly easy synthesis for instance detection, Proceedings of the International Conference on Computer Vision (ICCV, 2017.
DOI : 10.1109/iccv.2017.146

URL : http://arxiv.org/pdf/1708.01642

A. Gupta, A. Vedaldi, and A. Zisserman, Synthetic data for text localisation in natural images, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/cvpr.2016.254

URL : http://arxiv.org/pdf/1604.06646

G. Georgakis, A. Mousavian, A. C. Berg, and J. Kosecka, Synthesizing training data for object detection in indoor scenes, 2017.
DOI : 10.15607/rss.2017.xiii.043

URL : https://doi.org/10.15607/rss.2017.xiii.043

T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona et al., Microsoft COCO: Common objects in context, Proceedings of the European Conference on Computer Vision (ECCV), 2014.
DOI : 10.1007/978-3-319-10602-1_48

URL : http://arxiv.org/pdf/1405.0312.pdf

N. Dvornik, J. Mairal, and C. Schmid, Modeling visual context is key to augmenting object detection datasets, Proceedings of the European Conference on Computer Vision (ECCV), 2018.
DOI : 10.1007/978-3-030-01258-8_23

URL : https://hal.archives-ouvertes.fr/hal-01844474

N. Dvornik, K. Shmelkov, J. Mairal, and C. Schmid, Blitznet: A real-time deep network for scene understanding, Proceedings of the International Conference on Computer Vision (ICCV, 2017.
DOI : 10.1109/iccv.2017.447

URL : https://hal.archives-ouvertes.fr/hal-01573361

S. Ren, K. He, R. Girshick, and J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks, Advances in Neural Information Processing Systems (NIPS), 2015.
DOI : 10.1109/tpami.2016.2577031

URL : http://arxiv.org/pdf/1506.01497

A. Torralba and P. Sinha, Statistical context priming for object detection, Proceedings of the International Conference on Computer Vision (ICCV), 2001.
DOI : 10.1109/iccv.2001.937604

URL : http://web.mit.edu/torralba/www/iccv2001.pdf

A. Torralba, Contextual priming for object detection, International Journal of Computer Vision, vol.53, issue.2, pp.169-191, 2003.

P. F. Felzenszwalb, R. B. Girshick, D. Mcallester, and D. Ramanan, Object detection with discriminatively trained part-based models, IEEE transactions on Pattern Analysis and Machine Intelligence (PAMI), vol.32, pp.1627-1645, 2010.

M. J. Choi, J. J. Lim, A. Torralba, and A. S. Willsky, Exploiting hierarchical context on a large database of object categories, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010.

S. Gould, R. Fulton, and D. Koller, Decomposing a scene into geometric and semantically consistent regions, Proceedings of the International Conference on Computer Vision (ICCV), 2009.
DOI : 10.1109/iccv.2009.5459211

URL : http://www.cs.jhu.edu/~misha/ReadingSeminar/Papers/Gould09.pdf

R. Girshick, Fast R-CNN, Proceedings of the International Conference on Computer Vision (ICCV), 2015.
DOI : 10.1109/iccv.2015.169

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed et al., SSD: Single shot multibox detector, Proceedings of the European Conference on Computer Vision (ECCV), 2016.
DOI : 10.1007/978-3-319-46448-0_2

URL : http://arxiv.org/pdf/1512.02325

W. Chu and D. Cai, Deep feature based contextual model for object detection, Neurocomputing, vol.275, pp.1035-1042, 2018.
DOI : 10.1016/j.neucom.2017.09.048

URL : http://arxiv.org/pdf/1604.04048

S. Bell, C. L. Zitnick, K. Bala, and R. Girshick, Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/cvpr.2016.314

URL : http://arxiv.org/pdf/1512.04143

C. Fu, W. Liu, A. Ranga, A. Tyagi, and A. C. Berg, DSSD: Deconvolutional single shot detector, 2017.

S. K. Divvala, D. Hoiem, J. H. Hays, A. A. Efros, and M. Hebert, An empirical study of context in object detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.

E. Barnea and O. Ben-shahar, On the utility of context (or the lack thereof) for object detection, 2017.

R. Yu, X. Chen, V. I. Morariu, and L. S. Davis, The role of context selection in object detection, British Machine Vision Conference (BMVC), 2016.

B. Yao and L. Fei-fei, Modeling mutual context of object and human pose in human-object interaction activities, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010.

X. Ren and J. Malik, Learning a classification model for segmentation, Proceedings of the International Conference on Computer Vision (ICCV), 2003.

X. He, R. S. Zemel, and D. Ray, Learning and incorporating topdown cues in image segmentation, Proceedings of the European Conference on Computer Vision (ECCV), 2006.

J. Yang, B. Price, S. Cohen, and M. Yang, Context driven scene parsing with attention to rare classes, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.

J. Shotton, J. Winn, C. Rother, and A. Criminisi, Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation, Proceedings of the European Conference on Computer Vision (ECCV), 2006.

T. Leung and J. Malik, Representing and recognizing the visual appearance of materials using three-dimensional textons, International Journal of Computer Vision (IJCV), vol.43, pp.29-44, 2001.

J. Long, E. Shelhamer, and T. Darrell, Fully convolutional networks for semantic segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.

V. Badrinarayanan, A. Kendall, and R. Cipolla, Segnet: A deep convolutional encoder-decoder architecture for image segmentation, IEEE transactions on Pattern Analysis and Machine Intelligence (PAMI), 2017.

F. Yu and V. Koltun, Multi-scale context aggregation by dilated convolutions, International Conference on Learning Representations (ICLR), 2016.

L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs, IEEE transactions on Pattern Analysis and Machine Intelligence (PAMI), vol.40, pp.834-848, 2018.

G. Lin, C. Shen, A. Van-den, I. Hengel, and . Reid, Exploring context with deep structured models for semantic segmentation, IEEE transactions on Pattern Analysis and Machine Intelligence (PAMI), vol.40, pp.1352-1366, 2018.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems (NIPS), 2012.

M. Frid-adar, E. Klang, M. Amitai, J. Goldberger, and H. Greenspan, Synthetic data augmentation using gan for improved liver lesion classification, 2018.

X. Peng, B. Sun, K. Ali, and K. Saenko, Learning deep object detectors from 3d models, Proceedings of the International Conference on Computer Vision (ICCV), 2015.

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, You only look once: Unified, real-time object detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

Z. Zhong, L. Zheng, G. Kang, S. Li, and Y. Yang, Random erasing data augmentation, 2017.

C. Sakaridis, D. Dai, and L. Van-gool, Semantic foggy scene understanding with synthetic data, International Journal of Computer Vision (IJCV), vol.126, pp.973-992, 2018.

A. Handa, V. Patraucean, V. Badrinarayanan, S. Stent, and R. Cipolla, Understanding real world indoor scenes with synthetic data, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

J. Mccormac, A. Handa, S. Leutenegger, and A. J. Davison, Scenenet rgb-d: Can 5m synthetic images beat generic imagenet pre-training on indoor segmentation, Proceedings of the International Conference on Computer Vision (ICCV, 2017.

W. Qiu and A. Yuille, Unrealcv: Connecting computer vision to unreal engine, Proceedings of the European Conference on Computer Vision (ECCV), 2016.

K. Karsch, V. Hedau, D. Forsyth, and D. Hoiem, Rendering synthetic objects into legacy photographs, ACM Transactions on Graphics (TOG), vol.30, issue.6, p.157, 2011.

Y. Movshovitz-attias, T. Kanade, and Y. Sheikh, How useful is photo-realistic rendering for visual learning, Proceedings of the European Conference on Computer Vision (ECCV), 2016.

H. Su, C. R. Qi, Y. Li, and L. J. Guibas, Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views, Proceedings of the International Conference on Computer Vision (ICCV), 2015.

S. Sankaranarayanan, Y. Balaji, A. Jain, S. N. Lim, and R. Chellappa, Learning from synthetic data: Addressing domain shift for semantic segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

R. Barth, J. Hemming, and E. J. Van-henten, Improved part segmentation performance by optimising realism of synthetic images using cycle generative adversarial networks, 2018.

L. Sixt, B. Wild, and T. Landgraf, Rendergan: Generating realistic labeled data, Frontiers in Robotics and AI, vol.5, p.66, 2018.

Z. Liao, A. Farhadi, Y. Wang, I. Endres, and D. Forsyth, Building a dictionary of image fragments, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012.

Y. Zhou, Y. Zhu, Q. Ye, Q. Qiu, and J. Jiao, Weakly supervised instance segmentation using class peak response, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

A. Khoreva, R. Benenson, J. H. Hosang, M. Hein, and B. Schiele, Simple does it: Weakly supervised instance and semantic segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR, 2017.

P. Prez, M. Gangnet, and A. Blake, Poisson image editing, SIGGRAPH'03), vol.22, pp.313-318, 2003.

K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

T. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, Focal loss for dense object detection, Proceedings of the International Conference on Computer Vision (ICCV, 2017.

O. Ronneberger, P. Fischer, and T. Brox, U-net: Convolutional networks for biomedical image segmentation, 2015.

J. Yang, J. Lu, D. Batra, and D. Parikh, A faster pytorch implementation of faster r-cnn, 2017.

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, International Conference on Learning Representations (ICLR), 2015.

D. Kingma and J. Ba, Adam: A method for stochastic optimization, International Conference on Learning Representations (ICLR), 2015.

L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, Semantic image segmentation with deep convolutional nets and fully connected CRFs, International Conference on Learning Representations (ICLR), 2015.