, 2(0.36) 4/8/9(0.42), 5(0.4) Fig. 2. MNIST one-vs-all experiment: Example of 8 handwritten digits identified as possibly missclassified by SPA (under 90% credibility intervals). The true label (black), the predicted one (green for correct decisions and orange for wrong ones), the second and third

D. R. Cox, The regression analysis of binary sequences, J. Roy. Stat. Soc. Ser. B, vol.20, issue.2, pp.215-242, 1958.

S. H. Walker and D. B. Duncan, Estimation of the probability of an event as a function of several independent variables, Biometrika, vol.54, issue.1-2, pp.167-179, 1967.

A. Y. Ng and M. I. Jordan, On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes, Adv. in Neural Information Process. Systems, pp.841-848, 2002.

A. Agresti, Logistic Regression, pp.165-210, 2003.

D. W. Hosmer, S. Lemeshow, and R. X. Sturdivant, Applied logistic regression, vol.398, 2013.

T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, 2001.

J. C. Marshall, D. J. Cook, N. Christou, G. R. Bernard, C. L. Sprung et al., Multiple organ dysfunction score: a reliable descriptor of a complex clinical outcome, Crit. Care. Med, vol.23, issue.10, pp.1638-52, 1995.

H. Chuang, High school youths' dropout and re-enrollment behavior, Economics of Education Review, vol.16, issue.2, pp.171-186, 1997.

F. Bach, R. Jenatton, J. Mairal, and G. Obozinski, Optimization with sparsity-inducing penalties, Found. Trends Mach. Learn, vol.4, issue.1, pp.1-106, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00613125

J. H. Albert and S. Chib, Bayesian analysis of binary and polychotomous response data, J. Amer. Stat. Assoc, vol.88, issue.422, pp.669-679, 1993.

C. C. Holmes and L. Held, Bayesian auxiliary variable models for binary and multinomial regression, Bayesian Anal, vol.1, issue.1, pp.145-168, 2006.

S. Frühwirth-schnatter and R. Frühwirth, Data Augmentation and MCMC for Binary and Multinomial Logit Models, pp.111-132, 2010.

R. B. Gramacy and N. G. Polson, Simulation-based regularized logistic regression, Bayesian Anal, vol.7, issue.3, pp.567-590, 2012.

N. G. Polson, J. G. Scott, and J. Windle, Bayesian inference for logistic models using Pólya-Gamma latent variables, J. Amer. Stat. Assoc, vol.108, issue.504, pp.1339-1349, 2013.

M. Pereyra, Proximal Markov chain Monte Carlo algorithms, Stat. Comput, vol.26, issue.4, pp.745-760, 2016.

A. Durmus, E. Moulines, and M. Pereyra, Efficient Bayesian computation by proximal Markov chain Monte Carlo: When Langevin meets Moreau, SIAM J. Imag. Sci, vol.11, issue.1, pp.473-506, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01267115

M. Vono, P. Chainais, and N. Dobigeon, Split-andaugmented Gibbs sampler -Application to large-scale inference problems

R. Rifkin and A. Klautau, In defense of one-vs-all classification, J. Mach. Learn. Res, vol.5, pp.101-141, 2004.

T. Park and G. Casella, The Bayesian lasso, J. Amer. Stat. Assoc, vol.103, issue.482, pp.681-686, 2008.

M. A. Figueiredo, Adaptive sparseness for supervised learning, IEEE Trans. Patt. Anal. Mach. Intell, vol.25, issue.9, pp.1150-1159, 2003.

D. L. Donoho and M. Elad, Optimally sparse representation in general (nonorthogonal) dictionaries via ?1 minimization, Proc. Nat. Academy of Science, vol.100, issue.5, pp.2197-2202, 2003.

A. Y. Ng, Feature selection, L1 vs. L2 regularization, and rotational invariance, Proc. Int. Conf. Machine Learning (ICML), 2004.

B. A. Olshausen and D. J. Field, Emergence of simple-cell receptive field properties by learning a sparse code for natural images, Nature, vol.381, pp.607-609, 1996.

S. S. Chen, D. L. Donoho, and M. A. Saunders, Atomic decomposition by basis pursuit, SIAM J. Sci. Comput, vol.20, issue.1, pp.33-61, 1998.

D. A. Van-dyk and X. Meng, The art of data augmentation, J. Comput. Graph. Stat, vol.10, issue.1, pp.1-50, 2001.

L. M. Briceno-arias, G. Chierchia, E. Chouzenoux, and J. Pesquet, A random block-coordinate Douglas-Rachford splitting method with low computational complexity for binary logistic regression, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01672507

A. Maignan and T. C. Scott, Fleshing out the generalized Lambert W function, ACM Commun. Comput. Algebra, vol.50, issue.2, pp.45-60, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01326771

I. Mezö and A. Baricz, On the generalization of the Lambert W function, Trans. Amer. Math. Soc, vol.369, pp.7917-7934, 2017.

L. Wan, M. Zeiler, S. Zhang, Y. L. Cun, and R. Fergus, Regularization of neural networks using DropConnect, Proc. Int. Conf. Machine Learning (ICML), vol.28, pp.1058-1066, 2013.

R. Bardenet, A. Doucet, and C. Holmes, On Markov chain Monte Carlo methods for tall data, J. Mach. Learn. Res, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01355287