A. Arnab, S. Jayasumana, S. Zheng, and P. Torr, Higher Order Conditional Random Fields in Deep Neural Networks, 2015.
DOI : 10.1109/CVPR.2014.119
URL : http://arxiv.org/pdf/1511.08119

N. Audebert, B. L. Saux, and S. Lefèvre, Semantic Segmentation of Earth Observation Data Using Multimodal and Multi-scale Deep Networks, Computer Vision ? ACCV 2016, pp.180-196, 2016.
DOI : 10.1127/1432-8364/2010/0041
URL : https://hal.archives-ouvertes.fr/hal-01360166

V. Badrinarayanan, A. Kendall, and R. Cipolla, SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.39, issue.12, 2015.
DOI : 10.1109/TPAMI.2016.2644615
URL : https://doi.org/10.1109/tpami.2016.2644615

A. Boulch, DAG of convolutional networks for semantic labeling, Office national d'études et de recherches aérospatiales, 2015.

M. Campos-taberner, A. Romero-soriano, C. Gatta, G. Camps-valls, A. Lagrange et al., Processing of Extremely High-Resolution LiDAR and RGB Data: Outcome of the 2015 IEEE GRSS Data Fusion Contest???Part A: 2-D Contest, Applied Earth Observations and Remote Sensing, pp.1-13, 2016.
DOI : 10.1109/JSTARS.2016.2569162
URL : https://hal.archives-ouvertes.fr/hal-01414573

K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman, Return of the Devil in the Details: Delving Deep into Convolutional Nets, Proceedings of the British Machine Vision Conference 2014, pp.6-7, 2014.
DOI : 10.5244/C.28.6

G. Liang-chieh-chen, I. Papandreou, K. Kokkinos, A. Murphy, and . Yuille, Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, Proceedings of the International Conference on Learning Representations, 2015.

M. Cramer, The DGPF test on digital aerial camera evaluation ? overview and test design, Photogrammetrie ? Fernerkundung ? Geoinformation, vol.2, pp.73-82, 2010.
DOI : 10.1127/1432-8364/2010/0041

A. Eitel, J. T. Springenberg, L. Spinello, M. Riedmiller, and W. Burgard, Multimodal deep learning for robust RGB-D object recognition, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp.681-687, 2015.
DOI : 10.1109/IROS.2015.7353446
URL : http://arxiv.org/pdf/1507.06821

M. Everingham, S. M. Ali-eslami, L. Van-gool, C. K. Williams, J. Winn et al., The Pascal Visual Object Classes Challenge: A Retrospective, International Journal of Computer Vision, vol.34, issue.11, pp.98-136, 2014.
DOI : 10.1109/TPAMI.2012.204

M. Gerke, Use of the Stair Vision Library within the ISPRS 2d Semantic Labeling Benchmark (Vaihingen) Technical report, International Institute for Geo-Information Science and Earth Observation, 2015.

K. He, X. Zhang, S. Ren, and J. Sun, Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, 2015 IEEE International Conference on Computer Vision (ICCV), pp.1026-1034, 2015.
DOI : 10.1109/ICCV.2015.123
URL : http://arxiv.org/pdf/1502.01852

K. He, X. Zhang, S. Ren, and J. Sun, Deep Residual Learning for Image Recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.770-778, 2016.
DOI : 10.1109/CVPR.2016.90
URL : http://arxiv.org/pdf/1512.03385

S. Ioffe and C. Szegedy, Batch Normalization : Accelerating Deep Network Training by Reducing Internal Covariate Shift, Proceedings of the 32nd International Conference on Machine Learning, pp.448-456, 2015.

A. Lagrange, B. L. Saux, A. Beaupère, A. Boulch, A. Chan-hon-tong et al., Benchmarking classification of earthobservation data : From learning explicit features to convolutional networks, 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pp.4173-4176, 2015.
DOI : 10.1109/igarss.2015.7326745

G. Lin, C. Shen, A. Van-den, I. Hengel, and . Reid, Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2016.348
URL : https://digital.library.adelaide.edu.au/dspace/bitstream/2440/105526/2/RA_hdl_105526.pdf

T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona et al., Microsoft COCO: Common Objects in Context, Computer Vision ? ECCV 2014, number 8693 in Lecture Notes in Computer Science, pp.740-755, 2014.
DOI : 10.1007/978-3-319-10602-1_48
URL : http://arxiv.org/pdf/1405.0312.pdf

J. Long, E. Shelhamer, and T. Darrell, Fully convolutional networks for semantic segmentation, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.3431-3440, 2015.
DOI : 10.1109/CVPR.2015.7298965

D. Marmanis, K. S. , D. Wegner, S. Galliani, M. Datcu et al., Classification with an edge: Improving semantic image segmentation with boundary detection, ISPRS Journal of Photogrammetry and Remote Sensing, vol.135, 2016.
DOI : 10.1016/j.isprsjprs.2017.11.009
URL : http://arxiv.org/pdf/1612.01337

D. Marmanis, J. D. Wegner, S. Galliani, K. Schindler, M. Datcu et al., Semantic Segmentation of Aerial Images with an Ensemble of CNNs, ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, vol.3, pp.473-480, 2016.

V. Mnih and G. E. Hinton, Learning to Detect Roads in High-Resolution Aerial Images, Computer Vision ? ECCV 2010, number 6316 in Lecture Notes in Computer Science, pp.210-223, 2010.
DOI : 10.1007/978-3-642-15567-3_16
URL : http://learning.cs.toronto.edu/%7Ehinton/absps/road_detection.pdf

J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee et al., Multimodal deep learning, Proceedings of the 28th international conference on machine learning (ICML-11), pp.689-696, 2011.

K. Nogueira, A. B. Otávio, A. B. Penatti-otávio, J. A. Penatti, and . Santos, Towards better exploiting convolutional neural networks for remote sensing scene classification, Pattern Recognition, vol.61, 2016.
DOI : 10.1016/j.patcog.2016.07.001
URL : http://arxiv.org/pdf/1602.01517

H. Noh, S. Hong, and B. Han, Learning Deconvolution Network for Semantic Segmentation, 2015 IEEE International Conference on Computer Vision (ICCV), pp.1520-1528, 2015.
DOI : 10.1109/ICCV.2015.178
URL : http://arxiv.org/pdf/1505.04366

S. Paisitkriangkrai, J. Sherrah, P. Janney, A. Van-den, and . Hengel, Effective semantic pixel labelling with convolutional networks and Conditional Random Fields, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp.36-43, 2015.
DOI : 10.1109/CVPRW.2015.7301381

A. B. Otávio, A. B. Penatti-otávio, K. Penatti, J. A. Nogueira, and . Santos, Do deep features generalize from everyday objects to remote sensing and aerial scenes domains ?, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp.44-51, 2015.

. Nguyen-tien-quang, D. Nguyen-thi-thuy, H. Viet-sang, and . Binh, An Efficient Framework for Pixel-wise Building Segmentation from Aerial Images, Proceedings of the Sixth International Symposium on Information and Communication Technology, p.43, 2015.

F. Rottensteiner, G. Sohn, J. Jung, M. Gerke, C. Baillard et al., THE ISPRS BENCHMARK ON URBAN OBJECT CLASSIFICATION AND 3D BUILDING RECONSTRUCTION, ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, vol.3, issue.3, 2012.
DOI : 10.5194/isprsannals-I-3-293-2012
URL : https://www.isprs-ann-photogramm-remote-sens-spatial-inf-sci.net/I-3/293/2012/isprsannals-I-3-293-2012.pdf

J. Sherrah, Fully Convolutional Networks for Dense Semantic Labelling of High-Resolution Aerial Imagery, 2016.

K. Simonyan and A. Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition, 2014.

M. Volpi and D. Tuia, Dense Semantic Labeling of Subdecimeter Resolution Images With Convolutional Neural Networks, IEEE Transactions on Geoscience and Remote Sensing, vol.55, issue.2, pp.881-893, 2017.
DOI : 10.1109/TGRS.2016.2616585
URL : http://arxiv.org/pdf/1608.00775

Z. Wu and C. Shen, and Anton Van Den Hen- gel. High-performance Semantic Segmentation Using Very Deep Fully Convolutional Networks, 2016.

Z. Yan, H. Zhang, Y. Jia, T. Breuel, and Y. Yu, Combining the Best of Convolutional Layers and Recurrent Layers : A Hybrid Network for Semantic Segmentation, 2016.

J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, How transferable are features in deep neural networks ?, Advances in Neural Information Processing Systems, pp.3320-3328, 2014.

F. Yu and V. Koltun, Multi-Scale Context Aggregation by Dilated Convolutions, Proceedings of the International Conference on Learning Representations, 2015.

J. Zhao, M. Mathieu, R. Goroshin, and Y. Lecun, Stacked What-Where Auto-encoders, Proceedings of the International Conference on Learning Representations, 2015.

W. Zhao and S. Du, Learning multiscale and deep representations for classifying remotely sensed imagery, ISPRS Journal of Photogrammetry and Remote Sensing, vol.113, pp.155-165, 2016.
DOI : 10.1016/j.isprsjprs.2016.01.004

S. Zheng, S. Jayasumana, B. Romera-paredes, V. Vineet, Z. Su et al., Conditional Random Fields as Recurrent Neural Networks, 2015 IEEE International Conference on Computer Vision (ICCV), pp.1529-1537, 2015.
DOI : 10.1109/ICCV.2015.179
URL : http://arxiv.org/pdf/1502.03240