J. Aczél and Z. Daróczy, Charakterisierung der Entropien positiver Ordnung und der shannonschen Entropie, Acta Mathematica Academiae Scientiarum Hungaricae, vol.68, issue.24, 1963.
DOI : 10.1007/BF01901932

G. H. Bakir, T. Hofmann, B. Schölkopf, A. J. Smola, B. Taskar et al., Predicting Structured Data, 2007.

A. Behl, C. V. Jawahar, and M. P. Kumar, Optimizing average precision using weakly supervised data, CVPR, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00984699

M. B. Blaschko and C. H. Lampert, Learning to Localize Objects with Structured Output Regression, ECCV, 2008.
DOI : 10.1007/978-3-540-88682-2_2

Y. Boykov, O. Veksler, and R. Zabih, Fast approximate energy minimization via graph cuts, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.23, 2001.
DOI : 10.1109/iccv.1999.791245

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.112.6806

A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, Series B, 1977.

M. D. Esteban and D. Morales, A summary on entropy statistics, Kybernetika, vol.31, issue.4, 1995.

M. Everingham, L. Van-gool, C. K. Williams, J. Winn, and A. Zisserman, The Pascal Visual Object Classes (VOC) Challenge, International Journal of Computer Vision, vol.73, issue.2, 2010.
DOI : 10.1007/s11263-009-0275-4

P. F. Felzenszwalb, R. Girshick, and D. A. Mcallester, Discriminatively trained deformable part models, release 4
DOI : 10.1109/cvpr.2008.4587597

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.156.686

P. F. Felzenszwalb, D. A. Mcallester, and D. Ramanan, A discriminatively trained, multiscale, deformable part model, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587597

S. Fothergill, H. M. Mentis, P. Kohli, and S. Nowozin, Instructing people for training gestural interactive systems, Proceedings of the 2012 ACM annual conference on Human Factors in Computing Systems, CHI '12, 2012.
DOI : 10.1145/2207676.2208303

A. Gelman, J. Carlin, H. Stern, and D. Rubin, Bayesian Data Analysis, 1995.

M. P. Kumar, H. Turki, D. Preston, and D. Koller, Learning specific-class segmentation from diverse data, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126446

A. M. Lehrmann, P. V. Gehler, and S. Nowozin, Efficient Nonlinear Markov Models for Human Motion, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.171

S. Maji, L. D. Bourdev, and J. Malik, Action recognition from a distributed representation of pose and appearance, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995631

A. Mathai and P. Rathie, Basic Concepts in Information Theory and Statistics: Axiomatic Foundations and Applications, 1975.

K. Miller, M. P. Kumar, B. Packer, D. Goodman, and D. Koller, Max-margin min-entropy models, AISTATS, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00773602

R. Neal and G. Hinton, A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants, Learning in Graphical Models, 1999.
DOI : 10.1007/978-94-011-5014-9_12

S. Nowozin, C. H. Lampert, S. Nowozin, and J. Shotton, Structured learning and prediction in computer vision. Foundations and Trends in Computer Graphics and Vision Action points: A representation for low-latency online human action recognition, 2011.

W. Ping, Q. Liu, and A. Ihler, Marginal structured SVM with hidden variables, ICML, 2014.

J. Salojärvi, K. Puolamäki, and S. Kaski, Expectation maximization algorithms for conditional likelihoods, Proceedings of the 22nd international conference on Machine learning , ICML '05, 2005.
DOI : 10.1145/1102351.1102446

A. G. Schwing, T. Hazan, M. Pollefeys, and R. Urtasun, Efficient structured prediction with latent variables for general graphical models, ICML, 2012.

R. Sundberg, Maximum likelihood theory for incomplete data from an exponential family, Scandinavian Journal of Statistics, 1974.

B. Taskar, C. Guestrin, and D. Koller, Max-margin Markov networks, NIPS, 2003.

I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun, Support vector machine learning for interdependent and structured output spaces, Twenty-first international conference on Machine learning , ICML '04, 2004.
DOI : 10.1145/1015330.1015341

H. Wang, S. Gould, and D. Koller, Discriminative learning with latent variables for cluttered indoor scene understanding, Communications of the ACM, vol.56, issue.4, 2013.
DOI : 10.1145/2436256.2436276

C. Yu and T. Joachims, Learning structural SVMs with latent variables, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009.
DOI : 10.1145/1553374.1553523

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.147.1203

A. L. Yuille and A. Rangarajan, The Concave-Convex Procedure (CCCP), NIPS, 2002.

J. Zhu and E. P. Xing, Maximum entropy discrimination Markov networks, JMLR, 2009.

J. Zhu, E. P. Xing, and B. Zhang, Partially observed maximum entropy discrimination Markov networks, NIPS, 2008.

L. Zhu, Y. Chen, A. L. Yuille, and W. Freeman, Latent hierarchical structural learning for object detection, CVPR, 2010.