S. Amari, Natural Gradient Works Efficiently in Learning, Neural Computation, vol.37, issue.2, pp.251-276, 1998.
DOI : 10.1103/PhysRevLett.76.2188

P. Bartlett, M. Jordan, and J. D. Mcauliffe, Convexity, Classification, and Risk Bounds, Journal of the American Statistical Association, vol.101, issue.473, pp.138-156, 2006.
DOI : 10.1198/016214505000000907

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.15.3497

A. Bordes, L. Bottou, and P. Gallinari, Sgd-qn: Careful quasi-newton stochastic gradient descent, J. Mach. Learn. Res, vol.10, pp.1737-1754, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00750911

A. Bordes, L. Bottou, P. Gallinari, J. Chang, and S. A. Smith, Erratum: Sgdqn is less careful than expected, J. Mach. Learn. Res, vol.11, pp.2229-2240, 2003.
URL : https://hal.archives-ouvertes.fr/hal-00750268

L. Bottou and O. Bousquet, The tradeoffs of large scale learning, NIPS*20, pp.161-168, 2008.

J. Deng, W. Dong, R. Socher, L. Li, K. Li et al., ImageNet: A Large- Scale Hierarchical Image Database, CVPR'09, 2009.

Y. Freund and R. E. Schapire, A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting, Journal of Computer and System Sciences, vol.55, issue.1, pp.119-139, 1997.
DOI : 10.1006/jcss.1997.1504

G. Griffin, A. Holub, and P. Perona, Caltech-256 object category dataset, 2007.

T. Joachims, Optimizing search engines using clickthrough data, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '02, pp.133-142, 2002.
DOI : 10.1145/775047.775067

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.12.3161

S. Kakade, S. Shalev-shwartz, and A. Tewari, Applications of strong convexity? strong smoothness duality to learning with matrices, p.18, 2009.

S. Sathiya-keerthi, O. Chapelle, D. Decoste, and P. Bennett, Building support vector machines with reduced classifier complexity, JMLR, vol.7, issue.7 2, pp.1493-1515, 2006.

T. Mensink, J. Verbeek, F. Perronnin, and G. Csurka, Metric Learning for Large Scale Image Classification: Generalizing to New Classes at Near-Zero Cost, Procs of the 12 th ECCV, 2012.
DOI : 10.1007/978-3-642-33709-3_35

URL : https://hal.archives-ouvertes.fr/hal-00722313

R. Nock and F. Nielsen, On the efficient minimization of classification-calibrated surrogates, NIPS*21, pp.1201-1208, 2008.

F. Perronnin, Z. Akata, Z. Harchaoui, and C. Schmid, Towards good practice in largescale learning for image classification, CVPR'12, pp.3482-3489, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00690014

F. Perronnin, J. Sánchez, and T. Mensink, Improving the Fisher Kernel for Large-Scale Image Classification, Procs of the 11 th ECCV, pp.143-156, 2010.
DOI : 10.1007/978-3-642-15561-1_11

URL : https://hal.archives-ouvertes.fr/inria-00548630

N. Nicol, J. Schraudolph, S. Yu, and . Günter, A Stochastic Quasi-Newton Method for Online Convex Optimization, AISTATS'07, pp.436-443, 2007.

S. Shalev-shwartz, Y. Singer, and N. Srebro, Pegasos, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.807-814, 2007.
DOI : 10.1145/1273496.1273598

V. Vapnik, Statistical Learning Theory, 1998.

E. Vernet, R. Williamson, and M. Reid, Composite multiclass losses, NIPS*24, pp.1224-1232, 2011.

J. Wang, J. Yang, K. Yu, F. Lv, T. Huang et al., Locality-constrained Linear Coding for image classification, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.3360-3367, 2010.
DOI : 10.1109/CVPR.2010.5540018

J. Weston, S. Bengio, and N. Usunier, Wsabie: Scaling up to large vocabulary image annotation, IJCAI'11, pp.2764-2770, 2011.

J. Xiao, J. Hays, K. Ehinger, A. Oliva, and A. Torralba, SUN database: Large-scale scene recognition from abbey to zoo, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.3485-3492, 2010.
DOI : 10.1109/CVPR.2010.5539970

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.469.2228

T. Zhang, Solving large scale linear prediction problems using stochastic gradient descent algorithms, Twenty-first international conference on Machine learning , ICML '04, pp.116-123, 2004.
DOI : 10.1145/1015330.1015332