S. Alchatzidis, A. Sotiras, and N. Paragios, Discrete Multi Atlas Segmentation using Agreement Constraints, Proceedings of the British Machine Vision Conference 2014, 2014.
DOI : 10.5244/C.28.20
URL : https://hal.archives-ouvertes.fr/hal-01061457

D. Anguelov, B. Taskar, V. Chatalbashev, D. Koller, D. Gupta et al., Discriminative Learning of Markov Random Fields for Segmentation of 3D Scan Data, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), pp.169-176, 2005.
DOI : 10.1109/CVPR.2005.133

F. Bach, Structured sparsity-inducing norms through submodular functions, Advances in Neural Information Processing Systems, pp.118-126, 2010.
DOI : 10.1214/12-sts394
URL : https://hal.archives-ouvertes.fr/hal-00511310

F. Bach, Learning with submodular functions: A convex optimization perspective. Foundations and Trends in Machine Learning, pp.145-373, 2013.
DOI : 10.1561/2200000039
URL : https://hal.archives-ouvertes.fr/hal-00645271

L. Peter, M. I. Bartlett, J. D. Jordan, and . Mcauliffe, Convexity, classification, and risk bounds, Journal of the American Statistical Association, vol.101, issue.473, pp.138-156, 2006.

Y. Bengio, Learning Deep Architectures for AI, Foundations and Trends?? in Machine Learning, vol.2, issue.1, pp.1-127, 2009.
DOI : 10.1561/2200000006
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.150.527

D. P. Bertsekas, Nonlinear Programming, Athena, 1999.

A. Blake, C. Rother, M. Brown, P. Perez, and P. Torr, Interactive Image Segmentation Using an Adaptive GMMRF Model, ECCV, pp.428-441, 2004.
DOI : 10.1007/978-3-540-24670-1_33
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.86.980

M. B. Blaschko, Branch and Bound Strategies for Non-maximal Suppression in Object Detection, Energy Minimization Methods in Computer Vision and Pattern Recognition, pp.385-398, 2011.
DOI : 10.1007/BF01588971

B. Matthew, C. H. Blaschko, and . Lampert, Learning to localize objects with structured output regression, European Conference on Computer Vision, pp.2-15, 2008.

B. Matthew, J. Blaschko, and . Yu, Hardness results for structured learning and inference with multiple correct outputs, Constructive Machine Learning Workshop at ICML, 2015.

L. Bottou and O. Bousquet, The tradeoffs of large scale learning, Advances in Neural Information Processing Systems, pp.161-168, 2008.

S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers, Machine Learning, pp.1-122, 2011.
DOI : 10.1561/2200000016

Y. Boykov and V. Kolmogorov, An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.26, issue.9, pp.1124-1137, 2004.
DOI : 10.1109/TPAMI.2004.60

Y. Boykov, O. Veksler, and R. Zabih, Efficient approximate energy minimization via graph cuts. T-PAMI, pp.1222-1239, 2001.
DOI : 10.1109/iccv.1999.791245
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.112.6806

D. Chakrabarty, P. Jain, and P. Kothari, Provable submodular minimization using Wolfe's algorithm, NIPS, 2014.

G. Charpiat, Exhaustive family of energies minimizable exactly by a graph cut, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995567
URL : https://hal.archives-ouvertes.fr/inria-00616370

Y. Chen, H. Shioi, C. Fuentes-montesinos, L. P. Koh, S. Wich et al., Active detection via adaptive submodularity, ICML, pp.55-63, 2014.

W. Cheng, E. Hüllermeier, and K. J. Dembczynski, Bayes optimal multilabel classification via probabilistic classifier chains, Proceedings of the International Conference on Machine Learning, pp.279-286, 2010.

G. Choquet, Theory of capacities, Annales de l'institut Fourier, pp.131-295, 1953.
DOI : 10.5802/aif.53

P. Clifford, Markov random fields in statistics Disorder in physical systems: A volume in honour of John M. Hammersley, pp.19-32, 1990.

M. Collins and N. Duffy, Convolution kernels for natural language, Advances in neural information processing systems, pp.625-632, 2001.

M. Collins and N. Duffy, New ranking algorithms for parsing and tagging, Proceedings of the 40th Annual Meeting on Association for Computational Linguistics , ACL '02, pp.263-270, 2002.
DOI : 10.3115/1073083.1073128

C. Cortes and V. Vapnik, Support-vector networks, Machine Learning, vol.1, issue.3, pp.273-297, 1995.
DOI : 10.1007/BF00994018

K. Crammer and Y. Singer, On the algorithmic implementation of multiclass kernelbased vector machines, Journal of machine learning research, vol.2, pp.265-292, 2001.

A. Criminisi and J. Shotton, Decision forests for computer vision and medical image analysis, 2013.
DOI : 10.1007/978-1-4471-4929-3

R. Lee and . Dice, Measures of the amount of ecologic association between species, Ecology, vol.26, issue.3, pp.297-302, 1945.

J. Díez, O. Luaces, J. Del-coz, and A. Bahamonde, Optimizing different loss functions in multilabel classifications, Progress in Artificial Intelligence, vol.40, issue.7, pp.107-118, 2015.
DOI : 10.1007/s13748-014-0060-7

J. Janardhan-rao-doppa, C. Yu, A. Ma, P. Fern, and . Tadepalli, HC-search for multi-label prediction: An empirical study, Proceedings of AAAI Conference on Artificial Intelligence, 2014.

J. Edmonds, Matroids and the greedy algorithm, Mathematical Programming, vol.57, issue.1, pp.127-136, 1971.
DOI : 10.1007/BF01584082

M. Everingham, J. Sivic, and A. Zisserman, Hello! My name is... Buffy'' -- Automatic Naming of Characters in TV Video, Procedings of the British Machine Vision Conference 2006, 2006.
DOI : 10.5244/C.20.92

M. Everingham, J. Sivic, and A. Zisserman, Taking the bite out of automatic naming of characters in TV video, Image and Vision Computing, vol.27, issue.5, 2009.

M. Everingham, L. Van-gool, C. K. Williams, J. Winn, and A. Zisserman, The Pascal Visual Object Classes (VOC) Challenge, International Journal of Computer Vision, vol.73, issue.2, pp.303-338, 2010.
DOI : 10.1007/s11263-009-0275-4
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.167.6629

V. Feldman, V. Guruswami, P. Raghavendra, and Y. Wu, Agnostic Learning of Monomials by Halfspaces Is Hard, 2009 50th Annual IEEE Symposium on Foundations of Computer Science, pp.1558-1590, 2012.
DOI : 10.1109/FOCS.2009.26
URL : http://arxiv.org/abs/1012.0729

T. Finley and T. Joachims, Training structural SVMs when exact inference is intractable, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.304-311, 2008.
DOI : 10.1145/1390156.1390195
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.147.1386

Y. Freund, E. Robert, and . Schapire, A desicion-theoretic generalization of on-line learning and an application to boosting, European conference on computational learning theory, pp.23-37, 1995.
DOI : 10.1007/3-540-59119-2_166

J. Friedman, T. Hastie, and R. Tibshirani, Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors), The Annals of Statistics, vol.28, issue.2, pp.337-407, 2000.
DOI : 10.1214/aos/1016218223

S. Fujishige, Lexicographically Optimal Base of a Polymatroid with Respect to a Weight Vector, Mathematics of Operations Research, vol.5, issue.2, pp.186-196, 1980.
DOI : 10.1287/moor.5.2.186

S. Fujishige, Submodular functions and optimization, 2005.

B. Fulkerson, A. Vedaldi, and S. Soatto, Class segmentation and object localization with superpixel neighborhoods, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459175
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.150.4613

W. Gao and Z. Zhou, On the consistency of multi-label learning, Artificial Intelligence, vol.199, issue.200, pp.22-44, 2013.
DOI : 10.1016/j.artint.2013.03.001

J. Gillenwater, R. Iyer, B. Lusch, R. Kidambi, and J. Bilmes, Submodular Hamming metrics, Neural Information Processing Society (NIPS), 2015.

M. Grötschel, L. Lovász, and A. Schrijver, The ellipsoid method and its consequences in combinatorial optimization, Combinatorica, vol.2, issue.2, pp.169-197, 1981.
DOI : 10.1007/BF02579273

M. Grötschel, L. Lovász, and A. Schrijver, Geometric algorithms and combinatorial optimization, 1988.

C. Varun-gulshan, A. Rother, A. Criminisi, A. Blake, and . Zisserman, Geodesic star convexity for interactive image segmentation, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.3129-3136, 2010.
DOI : 10.1109/CVPR.2010.5540073

M. Gygli, H. Grabner, and L. Van-gool, Video summarization by learning submodular mixtures of objectives, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.3090-3098, 2015.
DOI : 10.1109/CVPR.2015.7298928

F. Han and S. Zhu, Bottom-up/top-down image parsing with attribute grammar, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.31, issue.1, pp.59-73, 2009.

T. Hastie, R. Tibshirani, and J. Friedman, The elements of statistical learning: Data mining, inference and prediction, 2009.

S. Iwata, A Faster Scaling Algorithm for Minimizing Submodular Functions, SIAM Journal on Computing, vol.32, issue.4, pp.833-840, 2003.
DOI : 10.1137/S0097539701397813
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.100.8937

R. Iyer and J. Bilmes, Algorithms for approximate minimization of the difference between submodular functions, with applications, Uncertainty in Artificial Intelligence (UAI), 2012.

R. Iyer and J. Bilmes, The Lovász-Bregman divergence and connections to rank aggregation, clustering, and web ranking: Extended version, Uncertainity in Artificial Intelligence, 2013.

T. Joachims, T. Finley, and C. Yu, Cutting-plane training of structural SVMs, Machine Learning, pp.27-59, 2009.
DOI : 10.1007/s10994-009-5108-8
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.140.1367

A. Kirillov, D. Schlesinger, D. Vetrov, C. Rother, and B. Savchynskyy, M-best-diverse labelings for submodular energies and beyond, Proceedings of the 28th International Conference on Neural Information Processing Systems, pp.613-621, 2015.

D. Koller, N. Friedman, L. Getoor, and B. Taskar, Graphical models in a nutshell, Introduction to Statistical Relational Learning, 2007.

A. N. Kolmogorov and S. V. Fomin, Introductory Real Analysis, 1975.

V. Kolmogorov, Minimizing a sum of submodular functions, Discrete Applied Mathematics, vol.160, issue.15, pp.2246-2258, 2012.
DOI : 10.1016/j.dam.2012.05.025

V. Kolmogorov and R. Zabin, What energy functions can be minimized via graph cuts? IEEE transactions on pattern analysis and machine intelligence, pp.147-159, 2004.
DOI : 10.1007/3-540-47977-5_5
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.113.1823

N. Komodakis, N. Paragios, and G. Tziritas, MRF Optimization via Dual Decomposition: Message-Passing Revisited, 2007 IEEE 11th International Conference on Computer Vision, 2007.
DOI : 10.1109/ICCV.2007.4408890
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.150.3345

A. Krause, SFO: A toolbox for submodular function optimization, JMLR, vol.11, pp.1141-1144, 2010.

A. Krause and D. Golovin, Submodular Function Maximization, Tractability: Practical Approaches to Hard Problems, 2014.
DOI : 10.1017/CBO9781139177801.004
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.696.6310

A. Krause, A. Singh, and C. Guestrin, Near-optimal sensor placements in gaussian processes: Theory, efficient algorithms and empirical studies, Journal of Machine Learning Research, vol.9, issue.Feb, pp.235-284, 2008.

S. Lacoste-julien, M. Jaggi, M. Schmidt, and P. Pletscher, Block-coordinate Frank-Wolfe optimization for structural SVMs, Proceedings of the 30th International Conference on Machine Learning, pp.53-61, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00720158

J. D. Lafferty, A. Mccallum, and F. C. Pereira, Conditional random fields: Probabilistic models for segmenting and labeling sequence data, ICML, 2001.

D. Liben-nowell and J. Kleinberg, The link-prediction problem for social networks
DOI : 10.1002/asi.20591
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.163.6528

H. Lin and J. Bilmes, A class of submodular functions for document summarization, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp.510-520, 2011.

L. Lovász, Submodular functions and convexity, Mathematical Programming The State of the Art, pp.235-257, 1983.
DOI : 10.1007/978-3-642-68874-4_10

O. L. Mangasarian, Uniqueness of solution in linear programming, Linear Algebra and its Applications, vol.25, issue.0, pp.151-162, 1979.
DOI : 10.1016/0024-3795(79)90014-4

D. Mcallester, Generalization bounds and consistency for structured labeling, Predicting Structured Data, 2007.

O. Meshi, N. Srebro, and T. Hazan, Efficient training of structured svms via soft constraints, AISTATS, pp.699-707, 2015.

M. Narasimhan and J. Bilmes, A submodular-supermodular procedure with applications to discriminative structure learning, Uncertainty in Artificial Intelligence (UAI), 2005.

G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher, An analysis of approximations for maximizing submodular set functions???I, Mathematical Programming, pp.265-294, 1978.
DOI : 10.1007/BF01588971

R. Nishihara, S. Jegelka, and M. I. Jordan, On the convergence rate of decomposable submodular function minimization, NIPS, pp.640-648, 2014.

S. Nowozin, Optimal Decisions from Probabilistic Models: The Intersection-over-Union Case, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.77
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.636.8544

S. Nowozin and C. H. Lampert, Structured learning and prediction in computer vision. Foundations and Trends in Computer Graphics and Vision, pp.3-4185, 2011.
DOI : 10.1561/0600000033
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.359.7873

O. Ore and Y. Ore, Theory of graphs, 1962.
DOI : 10.1090/coll/038

J. B. Orlin, A faster strongly polynomial time algorithm for submodular function minimization, Mathematical Programming, pp.237-251, 2009.
DOI : 10.1007/s10107-007-0189-2

A. Osokin and P. Kohli, Perceptually Inspired Layout-Aware Losses for Image Segmentation, ECCV, 2014.
DOI : 10.1007/978-3-319-10605-2_43

J. Petterson and T. S. Caetano, Submodular multi-label learning, Advances in Neural Information Processing Systems, pp.1512-1520, 2011.

P. Pletscher and P. Kohli, Learning low-order models for enforcing high-order statistics, AISTATS, 2012.

J. Simon and . Prince, Computer vision: models, learning, and inference, 2012.

M. Queyranne, Minimizing symmetric submodular functions, Mathematical Programming, vol.26, issue.2, pp.3-12, 1998.
DOI : 10.1007/BF01585863

M. Ranjbar, G. Mori, and Y. Wang, Optimizing Complex Loss Functions in Structured Prediction, Computer Vision?ECCV, pp.580-593, 2010.
DOI : 10.1007/978-3-642-15552-9_42
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.178.7638

A. Sharif-razavian, H. Azizpour, J. Sullivan, and S. Carlsson, CNN features off-the-shelf: An astounding baseline for recognition, IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp.512-519, 2014.

T. Rohlfing, Image Similarity and Tissue Overlaps as Surrogates for Image Registration Accuracy: Widely Used but Unreliable, IEEE Transactions on Medical Imaging, vol.31, issue.2, pp.153-163, 2012.
DOI : 10.1109/TMI.2011.2163944
URL : http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3274625

C. Rother, V. Kolmogorov, V. Lempitsky, and M. Szummer, Optimizing Binary MRFs via Extended Roof Duality, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007.
DOI : 10.1109/CVPR.2007.383203
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.63.4613

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh et al., ImageNet Large Scale Visual Recognition Challenge, International Journal of Computer Vision, vol.1010, issue.1
DOI : 10.1007/s11263-015-0816-y
URL : http://arxiv.org/abs/1409.0575

. Pavelrychì-y, A lexicographer-friendly association score, Proceedings of Recent Advances in Slavonic Natural Language Processing, 2008.

R. Mert, . Sabuncu, K. Bt-thomas-yeo, B. Van-leemput, P. Fischl et al., A generative model for image segmentation based on label fusion, Medical Imaging IEEE Transactions on, issue.10, pp.291714-1729, 2010.

M. Schmidt, UGM: Matlab code for undirected graphical models, 2012.

A. Schrijver, Combinatorial optimization: polyhedra and efficiency, 2002.

P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus et al., Overfeat: Integrated recognition, localization and detection using convolutional networks, International Conference on Learning Representations, 2014.

J. Sivic, M. Everingham, and A. Zisserman, Who are you? " ? learning person specific classifiers from video, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009.
DOI : 10.1109/cvprw.2009.5206513

T. Sørensen, A method of establishing groups of equal amplitude in plant sociology based on similarity of species and its application to analyses of the vegetation on Danish commons

P. Stobbe and A. Krause, Efficient minimization of decomposable submodular functions, NIPS, 2010.

M. Szummer, P. Kohli, and D. Hoiem, Learning CRFs Using Graph Cuts, ECCV, pp.582-595, 2008.
DOI : 10.1007/978-3-540-88688-4_43
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.141.7402

D. Tarlow, S. Richard, and . Zemel, Structured output learning with high order loss functions, AISTATS, 2012.

D. Tarlow, I. E. Givoni, and R. S. Zemel, HOP-MAP: Efficient message passing with high order potentials, AISTATS, 2010.

B. Taskar, C. Guestrin, and D. Koller, Max-margin markov networks, Advances in Neural Information Processing Systems 16, pp.25-32, 2004.

A. Tewari and P. L. Bartlett, On the Consistency of Multiclass Classification Methods, The Journal of Machine Learning Research, vol.8, pp.1007-1025, 2007.
DOI : 10.1007/11503415_10

I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun, Large margin methods for structured and interdependent output variables, Journal of Machine Learning Research, vol.6, issue.9, pp.1453-1484, 2005.

Z. Tu, X. Chen, A. L. Yuille, and S. Zhu, Image Parsing: Unifying Segmentation, Detection, and Recognition, International Journal of Computer Vision, vol.13, issue.1, pp.113-140, 2005.
DOI : 10.1007/s11263-005-6642-x

W. Tutte, Introduction to the Theory of Matroids, 1966.

R. Unnikrishnan, C. Pantofaru, and M. Hebert, Toward objective evaluation of image segmentation algorithms. Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.29, issue.6, pp.929-944, 2007.
DOI : 10.1109/tpami.2007.1046
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.71.4322

T. Uno, T. Asai, Y. Uchida, and H. Arimura, An Efficient Algorithm for Enumerating Closed Patterns in Transaction Databases, Discovery Science, pp.16-31, 2004.
DOI : 10.1007/978-3-540-30214-8_2

N. Vladimir and . Vapnik, The Nature of Statistical Learning Theory, 1995.

K. Wei, R. Iyer, and J. Bilmes, Submodularity in data subset selection and active learning, International Conference on Machine Learning (ICML), 2015.

N. White, Theory of matroids. Number 26, 1986.

J. Yu and M. B. Blaschko, Lovász hinge for learning submodular losses, NIPS Workshop on Representation and Learning Methods for Complex Outputs, 2014.

J. Yu and M. B. Blaschko, The Lovász hinge: A convex surrogate for submodular losses, 2015.

J. Yu and M. B. Blaschko, Learning submodular losses with the Lovász hinge, Proceedings of the 32nd International Conference on Machine Learning, pp.1623-1631, 2015.

J. Yu and M. B. Blaschko, A convex surrogate operator for general non-modular loss functions, International Conference on Artificial Intelligence and Statistics, pp.1032-1041, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01299519

J. Yu and M. B. Blaschko, A convex surrogate operator for general non-modular loss functions, Benelearn, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01299519

J. Yu and M. B. Blaschko, Efficient learning for discriminative segmentation with supermodular losses, Proceedings of the British Machine Vision Conference, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01349000

J. Yu and M. B. Blaschko, Efficient learning for discriminative segmentation with supermodular losses, Women in Machine Learning Workshop (WiML), 2016.
URL : https://hal.archives-ouvertes.fr/hal-01349000

J. Yu and M. B. Blaschko, An efficient decomposition framework for discriminative segmentation with supermodular losses, 2017.