, Application : apprentissage semi-supervisé d'un classifieur Naive Bayes

. .. Méthodes-discriminantes, 3.3 Borne transductive sur l'erreur du classifieur de Bayes

. .. Méthodes-graphiques,

K. Fukunaga, Introduction to Statistical Pattern Recognition, 1972.

R. Duda, P. Hart, and D. Stork, , 2001.

B. Schölkopf and A. J. Smola, Learning with kernels : support vector machines, regularization, optimization, and beyond, 2002.

S. Boucheron, O. Bousquet, and G. Lugosi, Theory of classification : a survey of some recent advances, ESAIM : Probability and Statistics, pp.323-375, 2005.
URL : https://hal.archives-ouvertes.fr/hal-00017923

M. R. Genesereth and N. J. Nilsson, Logical Foundations of Artificial Intelligence, 1987.

O. Bousquet, S. Boucheron, and G. Lugosi, Introduction to statistical learning theory, Advanced Lectures on Machine Learning, pp.169-207, 2003.

J. Langford, Tutorial on practical prediction theory for classification, Journal of Machine Learning Research, vol.6, pp.273-306, 2005.

W. Hoeffding, Probability inequalities for sums of bounded random variables, Journal of the American Statistical Association, vol.58, pp.13-30, 1963.

P. L. Tchebychev, Des valeurs moyennes, Journal de mathématiques pures et appliquées, vol.2, issue.12, pp.177-184, 1867.

V. N. Vapnik and A. J. Chervonenkis, On the uniform convergence of relative frequencies of events to their probabilities, Theory of Probability and its Applications, vol.16, pp.264-280, 1971.

N. Sauer, On the density of families of sets, Journal of Combinatorial Theory, vol.13, issue.1, pp.145-147, 1972.

S. Shelah, A combinatorial problem : Stability and order for models and theories in infinity languages, Pacific Journal of Mathematics, vol.41, pp.247-261, 1972.

H. Brönnimann and M. T. Goodrich, Almost optimal set covers in finite vc-dimension, Discrete and Computational Geometry, vol.14, issue.4, pp.463-479, 1995.

N. Cesa-bianchi and D. Haussler, A graph-theoretic generalization of the sauer-shelah lemma, Discrete Applied Mathematics, vol.86, pp.27-35, 1998.

M. Mohri, A. Rostamizadeh, and A. Talwalkar, Foundations of Machine Learning, 2012.

V. N. Vapnik and A. J. Chervonenkis, Theory of pattern recognition, Nauka, 1974.

V. I. Koltchinskii and D. Panchenko, Rademacher processes and bounding the risk of function learning, pp.443-459, 2000.

V. I. Koltchinskii, Rademacher penalties and structural risk minimization, IEEE Transactions on Information Theory, vol.47, issue.5, pp.1902-1914, 2001.

P. Massart, Some applications of concentration inequalities to statistics, Annales de la faculté des sciences de Toulouse, vol.9, pp.245-303, 2000.

P. L. Bartlett and S. Mendelson, Rademacher and gaussian complexities : risk bounds and structural results, Journal of Machine Learning Research, vol.3, pp.463-482, 2003.

J. Taylor and N. Cristianini, Kernel Methods for Pattern Analysis, 2004.

A. Blumer, A. Ehrenfeucht, D. Haussler, and M. Warmuth, Learnability and the vapnik-chervonenkis dimension, Journal of the ACM, vol.36, pp.929-965, 1989.

A. Ehrenfeucht, D. Haussler, M. Kearns, and L. Valiant, A general lower bound on the number of examples needed for learning, Information and Computation, vol.82, pp.247-261, 1989.

E. Giné, Empirical processes and applications : an overview, Bernoulli, vol.2, issue.1, pp.1-28, 1996.

C. Mcdiarmid, On the method of bounded differences, Surveys in combinatorics, vol.141, pp.148-188, 1989.

M. Ledoux and M. Talagrand, Probability in Banach Spaces : Isoperimetry and Processes, 1991.

A. Antos, B. Kégl, T. Linder, and G. Lugosi, Data-dependent margin-based generalization bounds for classification, Journal of Machine Learning Research, vol.3, pp.73-98, 2003.

R. Kohavi, A study of cross-validation and bootstrap for accuracy estimation and model selection, Proceedings of the 14 th International Joint Conference on Artificial Intelligence (IJCNN), pp.1137-1143, 1995.

P. L. Bartlett, M. I. Jordan, and J. D. Mcauliffe, Convexity, classification, and risk bounds, Journal of the American Statistical Association, vol.101, issue.473, pp.138-156, 2006.

S. Boyd and L. Vandenberghe, Convex Optimization, 2004.

J. Nocedal and S. J. Wright, Numerical Optimization, 2006.

D. D. Lewis, Y. Yang, T. Rose, and F. Li, RCV1 : A new benchmark collection for text categorization research, Journal of Machine Learning Research, vol.5, pp.361-397, 2004.

F. Sebastiani, Machine learning in automated text categorization, ACM Computing Surveys, vol.34, issue.1, pp.1-47, 2004.

R. W. Hamming, Error detecting and error correcting codes, Bell System Technical Journal, vol.29, issue.2, pp.147-160, 1950.

T. G. Dietterich and G. Bakiri, Solving multiclass learning problems via error-correcting output codes, Journal of Artificial Intelligence Research, vol.2, pp.263-286, 1995.

E. L. Allwin, R. E. Schapire, and Y. Singer, Reducing multiclass to binary : A unifying approach for margin classifiers, Journal of Machine Learning Research, vol.1, pp.113-141, 2000.

Y. Guermeur and . Svm-multiclasses, Théorie et Applications. Habilitation à diriger des recherches, 2007.

Y. Guermeur, Sample complexity of classifiers taking values in R Q , application to multi-class SVMs, Communications in Statistics -Theory and Methods, vol.39, issue.3, pp.543-557, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00593980

D. E. Rumelhart, G. E. Hinton, and R. Williams, Learning internal representations by error propagation, Parallel Distributed Processing : Explorations in the Microstructure of Cognition, vol.I, 1986.

L. Bottou, Online algorithms and stochastic approximations, Online Learning and Neural Networks, 1998.

L. Bottou, Large-scale machine learning with stochastic gradient descent, Proceedings of the 19th International Conference on Computational Statistics (COMPSTAT'2010, pp.177-187, 2010.

F. Bach and E. Moulines, Non-strongly-convex smooth stochastic approximation convergence rate o( 1 n ), Advances in Neural Information Processing Systems (NIPS 26), pp.773-781, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00831977

A. S. Nemirovski and D. B. Yudin, Problem complexity and method efficiency in optimization, 1983.

A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, Robust stochastic approximation approach to stochastic programming, Journal of the Society for Industrial and Applied Mathematics on Optimization (SIOPT), vol.19, issue.4, pp.1574-1609, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00976649

P. E. Gill and M. W. Leonard, Reduced-hessian quasi-newton methods for unconstrained optimization, Journal of the Society for Industrial and Applied Mathematics on Optimization (SIOPT), vol.12, issue.1, 2001.

P. Deuflhard, Newton Methods for Nonlinear Problems : Affine Invariance and Adaptive Algorithms, 2004.

J. F. Bonnans, J. C. Gilbert, C. Lemaréchal, and C. Sagastizàbal, Numerical optimization, theoretical and numerical aspects, 2006.

R. Fletcher, Practical methods of optimization, 1987.

W. C. Davidon, Variable metric method for minimization, Journal of the Society for Industrial and Applied Mathematics on Optimization (SIOPT), vol.1, issue.1, pp.1-17, 1991.

E. Polak, Computational methods in optimization, 1971.

P. Wolfe, Convergence conditions for ascent methods, SIAM Review, vol.11, issue.2, pp.226-235, 1966.

L. Armijo, Minimization of functions having lipschitz continuous first partial derivatives, Pacific Journal of Mathematics, vol.16, issue.1, pp.1-3, 1966.

G. Zoutendijk, Some recent development in nonlinear programming, 5th Conference on Optimization Techniques, pp.407-417, 1973.

J. E. Dennis, J. , and R. B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Classics in Applied Mathematics, 16). Soc for Industrial & Applied Math, 1996.

K. A. Atkinson, An introduction to numerical analysis, 1988.

M. R. Hestenes and E. Stiefel, Methods of conjugate gradients for solving linear systems, Journal of Research of the National Bureau of Standards, vol.49, pp.409-436, 1952.

E. Polak and G. Ribiere, Note sur la convergence de méthodes de directions conjuguées, ESAIM : Mathematical Modelling and Numerical Analysis -Modélisation Mathématique et Analyse Numérique, vol.3, issue.R1, pp.35-43, 1969.

R. Fletcher and C. M. Reeves, Function minimization by conjugate gradients, The Computer Journal, vol.7, issue.2, pp.149-154, 1964.

W. S. Mcculloch and W. Pitts, A logical calculus of the ideas immanent in nervous activity, Bulletin of Mathematical Biophysics, vol.5, pp.115-133, 1943.

A. M. Turing, Computing machinery and intelligence, vol.59, pp.433-460, 1950.

D. Hebb, The Organization of Behavior, 1949.

M. Minsky, A neural-analogue calculator based upon a probability model of reinforcement, 1952.

F. Rosenblatt, The perceptron : A probabilistic model for information storage and organization in the brain, Psychological Review, vol.65, pp.386-408, 1958.

A. B. Novikoff, On convergence proofs on perceptrons, Symposium on the Mathematical Theory of Automata, vol.12, pp.615-622, 1962.

G. Widrow and M. Hoff, Institute of Radio Engineers, Western Electronic Show and Convention, Convention Record, vol.4, pp.96-104, 1960.

N. J. Nilsson, Learning machines ; foundations of trainable pattern-classifying systems, 1965.

M. Minsky and S. Papert, Perceptrons : An Introduction to Computational Geometry, 1969.

J. A. Anderson and E. Rosenfeld, Neurocomputing : Foundations of Research, 1988.

J. J. Hopfield, Neurons with graded response have collective computational properties like those of two-state neurons, Proceedings of the National Academy of Sciences USA, pp.3088-3092, 1984.

Y. Lecun, L. Bottou, and Y. Bengio, Reading checks with multilayer graph transformer networks, Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1997.

S. Hocreiter and J. Schmidhuber, Long short-term memory, Neural Computation, vol.9, issue.8, pp.1735-1780, 1997.

V. N. Vapnik, The nature of statistical learning theory, 1995.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems, vol.25, pp.1097-1105, 2012.

Y. Freund and R. E. Schapire, Large margin classification using the perceptron algorithm, Machine Learning Journal, vol.37, pp.277-296, 1999.

J. Truett, J. Cornfield, and W. Kannel, A multivariate analysis of the risk of coronary heart disease in framingham, Journal of Chronic Diseases, vol.20, issue.7, pp.511-524, 1967.

J. A. Anderson, Logistic discrimination, Handbook of Statistics, vol.2, pp.169-191, 1982.

M. Kupperman, Probabilities of hypotheses et information-statistics in sampling from exponential-class populations, Annals of Mathematical Statistics, vol.9, issue.2, pp.571-575, 1958.

P. J. Werbos, Beyond Regression : New Tools for Prediction and Analysis in the Behavioral Sciences, 1974.

D. Parker, Learning logic, 1985.

L. Bottou, Une Approche théorique de l'Apprentissage Connexionniste : Applications à la Reconnaissance de la Parole, 1991.

D. Hubel and T. Wiesel, Receptive fields and functional architecture of monkey striate cortex, Journal of Physiology, vol.195, issue.1, pp.215-243, 1968.

K. Fukushima, Neocognitron : A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position, Biological Cybernetics, vol.36, issue.4, 1980.

A. , Phoneme recognition using time-delay neural networks, Meeting of the Institute of Electrical, Information and Communication Engineers, 1987.

Y. Lecun, B. Boser, J. Denker, D. Henderson, R. Howard et al., Backpropagation applied to handwritten zip code recognition, Neural Computation, vol.1, issue.4, pp.541-551, 1989.

A. J. Robinson and F. Fallside, The utility driven dynamic error propagation network, 1987.

R. Williams and D. Zipser, Gradient-based learning algorithms for recurrent networks and their computational complexity, Backpropagation : theory, architectures, and applications, pp.433-486, 1995.

M. D. Richard and R. P. Lippman, Neural network classifiers estimate bayesian a posteriori probabilities, Neural Computation, vol.3, issue.4, pp.461-483, 1991.

T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, 2001.

V. Nair and G. Hinton, Rectified linear units improve restricted boltzmann machines, Proceedings of the 27th International Conference on Machine Learning, 2010.

B. Polyak, Some methods of speeding up the convergence of iteration methods, USSR Computational Mathematics and Mathematical Physics, vol.4, issue.5, pp.1-17, 1964.

Y. Nesterov, A method of solving a convex programming problem with convergence rate o(1/k 2 ), Soviet Mathematics Doklady, vol.27, pp.372-376, 1983.

I. Sutskever, J. Martens, G. Dahl, and G. Hinton, On the importance of initialization and momentum in deep learning, Proceedings of the 30th International Conference on Machine Learning, pp.1139-1147, 2013.

S. Ioffe and C. Szegedy, Batch normalization : Accelerating deep network training by reducing internal covariate shift, Proceedings of the 32nd International Conference on Machine Learning, pp.448-456, 2015.

G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Improving neural networks by preventing co-adaptation of feature detectors, CoRR, 2012.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout : A simple way to prevent neural networks from overfitting, Journal of Machine Learning Research, vol.15, pp.1929-1958, 2014.

B. Boser, I. Guyon, and V. Vapnik, A training algorithm for optimal margin classifiers, Proceedings of the Fifth Annual Workshop on Computational Learning Theory, 1992.

T. Joachims, Making large-scale SVM learning practical, Advances in Kernel Methods -Support Vector Learning, pp.169-184, 1999.

R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin, LIBLINEAR : A library for large linear classification, Journal of Machine Learning Research, vol.9, pp.1871-1874, 2008.

S. Shalev-shwartz, Y. Singer, N. Srebro, and A. Cotter, Pegasos : primal estimated sub-gradient solver for svm, Mathematical Programming, vol.127, issue.1, pp.3-30, 2011.

J. Mercer, Functions of positive and negative type and their connection with the theory of integral equations, Philosophical Transactions of the Royal Society, vol.209, pp.415-446, 1909.

J. Weston and C. Watkins, Support vector machines for multi-class pattern recognition, European Symposium on Artificial Neural Netwroks (ESANN), pp.219-224, 1999.

K. Crammer and Y. Singer, On the algorithmic implementation of multi class kernel-based vector machines, Journal of Machine Learning Research, vol.2, pp.265-292, 2001.

Y. Lee, Y. Lin, and G. Wahba, Multicategory support vector machines : Theory and application to the classification of microarray data and satellite radiance data, Journal of the American Statistical Association, vol.99, issue.465, pp.67-81, 2004.

L. Valiant, The theory of the learnable, Communications of the ACM, vol.27, issue.11, pp.1134-1142, 1984.

D. Kearns and L. Valiant, Learning boolean formulae or finite automata is as hard as factoring, 1988.

R. E. Schapire, The strength of weak learnability, Machine Learning, vol.5, pp.197-227, 1990.

R. E. Schapire, Theoretical views of boosting and applications, Proceedings of the 10th International Conference on Algorithmic Learning Theory, pp.13-25, 1999.

Y. Freund and R. E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, Journal of Computer and System Sciences, vol.55, issue.1, pp.119-139, 1997.

R. E. Schapire, Y. Freund, P. Bartlett, and W. S. Lee, Boosting the margin : a new explanation for the effectiveness of voting methods, The Annals of Statistics, vol.26, issue.5, pp.1651-1680, 1998.

A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihood from incomplete data via the em algorithm, Journal of the Royal Statistical Society. Series B (Methodological), vol.39, issue.1, pp.1-38, 1977.

D. M. Titterington, A. F. Smith, and U. E. Smith, Statistical Analysis of Finite Mixture Distributions, 1985.

M. J. Symons, Clustering criteria and multivariate normal mixtures, Biometrics, vol.37, issue.1, pp.35-43, 1981.

G. Celeux and G. Govaert, A classification em algorithm for clustering and two stochastic versions, Computational Statistics and Data Analysis, vol.14, issue.3, pp.315-332, 1992.
URL : https://hal.archives-ouvertes.fr/inria-00075196

T. Zhang and F. J. Oles, A probability analysis on the value of unlabeled data for classification problems, 17th International Conference on Machine Learning, 2000.

F. G. Cozman and I. Cohen, Unlabeled data can degrade classification performance of generative classifiers, Fifteenth International Florida Artificial Intelligence Society Conference, pp.327-331, 2002.

M. Seeger, Learning with labeled and unlabeled data, tech. rep, 2001.

S. Basu, A. Banerjee, and R. J. Mooney, Semi-supervised clustering by seeding, Proceedings of the Nineteenth International Conference on Machine Learning, pp.27-34, 2002.

G. J. Machlachlan, Discriminant Analysis and Statistical Pattern Recognition, 1992.

Y. Grandvalet and Y. Bengio, Semi-supervised learning by entropy minimization, Advances in Neural Information Processing Systems (NIPS 17), pp.529-536, 2005.

O. Chapelle, B. Schölkopf, and A. Zien, Semi-Supervised Learning, 2006.

K. Nigam, A. K. Mccallum, S. Thrun, and T. Mitchell, Text classification from labeled and unlabeled documents using EM, Machine Learning Journal, vol.39, issue.2 -3, pp.103-134, 2000.

M. Amini and É. Gaussier, Recherche d'Information -applications, modèles et algorithmes. Eyrolles, 2013.

I. Cohen, F. G. Cozman, N. Sebe, M. C. Cirelo, and T. S. Huang, Semisupervised learning of classifiers : Theory, algorithms, and their application to human-computer interaction, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.26, issue.12, pp.1553-1567, 2004.

S. C. Fralick, Learning to recognize patterns without a teacher, IEEE Transactions on Information Theory, vol.13, issue.1, pp.57-64, 1967.

E. A. Patrick, J. P. Costello, and F. C. Monds, Decision-directed estimation of a two-class decision boundary, IEEE Transactions on Information Theory, vol.9, issue.3, pp.197-205, 1970.

G. Tür, D. Z. Hakkani-tür, and R. E. Schapire, Combining active and semi-supervised learning for spoken language understanding, Speech Communication, vol.45, issue.2, pp.171-186, 2005.

R. Urner, S. Shalev-shwartz, and S. Ben-david, Access to unlabeled data can speed up prediction time, 28th International Conference on Machine Learning, pp.641-648, 2011.

P. Derbeko, E. El-yaniv, and R. Meir, Error bounds for transductive learning via compression and clustering, Advances in Neural Information Processing Systems (NIPS 15), pp.1085-1092, 2003.

T. Joachims, Transductive inference for text classification using support vector machines, Proceedings of the 16 th International Conference on Machine Learning, pp.200-209, 1999.

T. Joachims, Learning to Classify Text Using Support Vector Machines : Methods, Theory and Algorithms, 2002.

Z. Luo and P. Tseng, On the convergence of the coordinate descent method for convex differentiable minimization, Journal of Optimization theory and applications, vol.72, issue.1, pp.7-35, 1992.

M. Amini, N. Usunier, and F. Laviolette, A transductive bound for the voted classifier with an application to semi-supervised learning, Advances in Neural Information Processing Systems (NIPS 21), pp.65-72, 2009.

G. Dantzig, Maximization of a linear function of variables subject to linear inequalities, Activity Analysis of Production and Allocation (T. Koopmans, pp.339-347, 1951.

F. Bach, R. Lanckriet, and M. Jordan, Multiple kernel learning, conic duality, and the smo algorithm, Proceedings of the Twenty-first International Conference on Machine Learning, 2004.

V. Sindhwani, P. Niyogi, and M. Belkin, A co-regularization approach to semi-supervised learning with multiple views, ICML-05 Workshop on Learning with Multiple Views, pp.74-79, 2005.

A. Blum and T. Mitchell, Combining labeled and unlabeled data with co-training, Proceedings of the 11th Annual Conference on Learning Theory, pp.92-100, 1998.

B. Leskes, The value of agreement, a new boosting algorithm, Proceedings of Conference on Learning Theory (COLT), pp.95-110, 2005.

X. Zhu and Z. Ghahramani, Learning from labeled and unlabeled data with label propagation, 2002.

X. Zhu, Z. Ghahramani, and J. Lafferty, Semi-supervised learning using gaussian fields and harmonic functions, 20 th International Conference on Machine Learning, pp.912-919, 2003.

D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Schölkopf, Learning with local and global consistency, Advances in Neural Information Processing Systems (NIPS 16), pp.321-328, 2004.

G. Latouche and V. Ramaswami, Introduction to matrix analytic methods in stochastic modeling, Society for Industrial and Applied Mathematics, 1999.

M. Szummer and T. Jaakkola, Partially labeled classification with markov random walks, Advances in Neural Information Processing Systems (NIPS 14), pp.945-952, 2002.

E. , Random walks in multidimensional spaces, especially on periodic lattices, Journal of the Society for Industrial and Applied Mathematics (SIAM), vol.4, issue.4, pp.241-260, 1956.

W. W. Cohen, R. E. Schapire, and Y. Singer, Learning to order things, Advances in Neural Information Processing Systems (NIPS 10), pp.451-457, 1998.

C. Rudin, C. Cortes, M. Mohri, and R. E. Schapire, Margin-based ranking meets boosting in the middle, Conference On Learning Theory (COLT), 2005.

S. Agarwal, T. Graepel, R. Herbrich, S. Har-peled, and D. Roth, Generalization bounds for the area under the roc curve, Journal of Machine Learning Research, vol.6, pp.393-425, 2005.

C. Cortes and M. Mohri, AUC optimization vs. error rate minimization, Advances in Neural Information Processing Systems (NIPS 16), pp.313-320, 2004.

G. Salton, A vector space model for automatic indexing, Communications of the ACM, vol.18, issue.11, pp.613-620, 1975.

S. Hill, H. Zaragoza, R. Herbrich, and P. Rayner, Average Precision and the Problem of Generalisation, SIGIR Workshop on Mathematical and Formal Methods in Information Retrieval, 2002.

S. E. Robertson and S. Walker, Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval, SI-GIR'94, conference on Research and development in information retrieval, pp.232-241, 1994.

S. Clinchant and É. Gaussier, Information-based models for ad hoc IR, SIGIR'10, conference on Research and development in information retrieval, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00953830

P. Mccullagh, Regression models for ordinal data, Journal of the Royal Statistical Society. Series B (Methodological), vol.42, issue.2, pp.109-142, 1980.

K. Crammer and Y. Singer, Pranking with ranking, Advances in Neural Information Processing Systems (NIPS 14), pp.641-647, 2002.

A. Shashua and A. Levin, Ranking with large margin principle : Two approaches, Advances in Neural Information Processing Systems (NIPS 15), pp.961-968, 2003.

W. Chu and S. S. Keerthi, New approaches to support vector ordinal regression, 22th International Conference on Machine Learning, pp.145-152, 2005.

T. Qin, T. Liu, and H. Li, A general approximation framework for direct optimization of information retrieval measures, 2008.

M. Taylor, J. Guiver, S. Robertson, and T. Minka, Softrank : Optimising non-smooth rank metrics, WSDM 2008, 2008.

Y. Yue, T. Finley, F. Radlinski, and T. Joachims, A support vector method for optimizing average precision, SIGIR '07, pp.271-278, 2007.

J. Xu and H. Li, Adarank : A boosting algorithm for information retrieval, SIGIR '07, conference on Research and Development in Information Retrieval, pp.391-398, 2007.

J. Xu, T. Liu, M. Lu, H. Li, and W. Ma, Directly optimizing evaluation measures in learning to rank, SIGIR '08, pp.107-114, 2008.

C. Calauzènes, N. Usunier, and P. Gallinari, On the (non-)existence of convex, calibrated surrogate losses for ranking, Advances in Neural Information Processing Systems (NIPS 25), pp.197-205, 2012.

Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer, An efficient boosting algorithm for combining preferences, Journal of Machine Learning Research, vol.4, pp.933-969, 2003.

N. Usunier, Apprentissage de fonctions d'ordonnancement : une étude théorique de la réduction à la classification et deux applications à la Recherche d'Information, 2006.

S. K. Wong and Y. Y. Yao, Linear structure in information retrieval, SIGIR'88, conference on Research and Development in Information Retrieval, pp.219-232, 1988.

R. Herbrich, T. Graepel, P. Bollmann-sdorra, and K. Obermayer, Learning preference relations for information retrieval, Proceedings of the AAAI Workshop Text Categorization and Machine Learning, 1998.

A. Rakotomamonjy, Optimizing area under roc curve with SVMs, 1st International workshop on ROC Analysis in Artificial Intelligence, pp.71-80, 2004.

N. Usunier, M. Amini, and P. Gallinari, Generalization error bounds for classifiers trained with interdependent data, Advances in Neural Information Processing Systems (NIPS 18), pp.1369-1376, 2006.

H. Chernoff, A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations, Annals of Mathematical Statistics, vol.23, issue.4, pp.493-507, 1952.

P. Barbé, M. Ledoux, and P. , , 2007.

W. Feller, An Introduction to Probability Theory and Its Applications, 1968.

T. Bayes, An essay towards solving a problem in the doctrine of chances, Philosophical Transactions of the Royal Society of London, vol.53, pp.370-418, 1763.

P. S. Laplace, Mémoire sur la probabilité des causes par les Événements, Académie Royale des sciences de Paris (Savants étrangers), vol.6, pp.621-656, 1771.

S. Boucheron, G. Lugosi, and P. Massart, Concentration Inequalities : A Nonasymptotic Theory of Independence, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00794821