M. Abramowitz and I. Stegun, Handbook of Mathematical Functions, American Journal of Physics, vol.34, issue.2, 1965.
DOI : 10.1119/1.1972842

M. Aitkin, Posterior Bayes factors (with discussion), Journal of the Royal Statistical Society. Series B (Methodological), pp.111-142, 1991.

. Akaike, Information theory and an extension of the maximum likelihood principle, 2nd International Symposium on Information Theory, pp.267-281, 1973.

. Akaike, A new look at the statistical model identification, IEEE Transactions on Automatic Control, vol.19, issue.6, pp.716-723, 1974.
DOI : 10.1109/TAC.1974.1100705

P. Alquier and K. Lounici, PAC-Bayesian bounds for sparse regression estimation with exponential weights, Electronic Journal of Statistics, vol.5, issue.0, pp.127-145, 2011.
DOI : 10.1214/11-EJS601
URL : https://hal.archives-ouvertes.fr/hal-00465801

M. Aminghafari, N. Cheze, and J. Poggi, Multivariate denoising using wavelets and principal component analysis, Computational Statistics & Data Analysis, vol.50, issue.9, pp.2381-2398, 2006.
DOI : 10.1016/j.csda.2004.12.010
URL : https://hal.archives-ouvertes.fr/hal-01633702

D. N. Anderson, A multivariate Linnik distribution, Statistics & Probability Letters, vol.14, issue.4, pp.333-336, 1992.
DOI : 10.1016/0167-7152(92)90067-F

T. Ando, Bayesian predictive information criterion for the evaluation of hierarchical Bayesian and empirical Bayes models, Biometrika, vol.94, issue.2, pp.443-458, 2007.
DOI : 10.1093/biomet/asm017

T. Ando, Bayesian factor analysis with fat-tailed factors and its exact marginal likelihood, Journal of Multivariate Analysis, vol.100, issue.8, pp.1717-1726, 2009.
DOI : 10.1016/j.jmva.2009.02.001

T. Ando and R. Tsay, Predictive likelihood for Bayesian model selection and averaging, International Journal of Forecasting, vol.26, issue.4, pp.744-763, 2010.
DOI : 10.1016/j.ijforecast.2009.08.001

C. Archambeau and F. Bach, Sparse probabilistic projections, Advances in neural information processing systems, pp.73-80, 2009.

M. Arjovsky and L. Bottou, Towards principled methods for training generative adversarial networks. arXiv preprint, 2017.

F. Bach, Bolasso, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.33-40, 2008.
DOI : 10.1145/1390156.1390161
URL : https://hal.archives-ouvertes.fr/hal-00271289

F. Bach, R. Jenatton, J. Mairal, and G. Obozinski, Optimization with Sparsity-Inducing Penalties, Machine Learning, pp.1-106, 2012.
DOI : 10.1561/2200000015
URL : https://hal.archives-ouvertes.fr/hal-00613125

J. Bai and S. Ng, Determining the Number of Factors in Approximate Factor Models, Econometrica, vol.70, issue.1, pp.191-221, 2002.
DOI : 10.1111/1468-0262.00273

R. Bapat and T. Raghavan, Nonnegative matrices and applications, 1997.
DOI : 10.1017/CBO9780511529979

M. Baragatti and D. Pommeret, A study of variable selection using -prior distribution with ridge parameter, Computational Statistics & Data Analysis, vol.56, issue.6, pp.1920-1934, 2012.
DOI : 10.1016/j.csda.2011.11.017
URL : https://hal.archives-ouvertes.fr/hal-01293963

R. F. Barber, M. Drton, and K. M. Tan, Laplace Approximation in High-Dimensional Bayesian Regression, Statistical Analysis for High-Dimensional Data, pp.15-36
DOI : 10.1007/978-3-319-27099-9_2

O. Barndorff-nielsen, J. Kent, and M. Sørensen, Normal Variance-Mean Mixtures and z Distributions, International Statistical Review / Revue Internationale de Statistique, vol.50, issue.2, pp.145-159, 1982.
DOI : 10.2307/1402598

M. Bartlett, A comment on D. V. Lindley's statistical paradox, Biometrika, vol.44, issue.3-4, pp.533-534, 1957.
DOI : 10.1093/biomet/44.3-4.533

J. Bayarri and J. Berger, Hypothesis testing and model uncertainty, Bayesian Theory and Applications, pp.361-394, 2013.
DOI : 10.1093/acprof:oso/9780199695607.003.0018

R. Bellman, Dynamic programming, 1957.

Y. Benjamini and Y. Hochberg, Controlling the false discovery rate: a practical and powerful approach to multiple testing, Journal of the Royal Statistical Society: Series B (Statistical Methodology), pp.289-300, 1995.

J. O. Berger, Statistical decision theory and Bayesian analysis (second edition), 1985.
DOI : 10.1007/978-1-4757-4286-2

J. O. Berger, Could Fisher, Jeffreys and Neyman Have Agreed on Testing?, Statistical Science, vol.18, issue.1, pp.1-32, 2003.
DOI : 10.1214/ss/1056397485
URL : http://doi.org/10.1214/ss/1056397485

J. O. Berger and L. R. Pericchi, The Intrinsic Bayes Factor for Model Selection and Prediction, Journal of the American Statistical Association, vol.36, issue.433, pp.91109-122, 1996.
DOI : 10.2307/2348514

J. M. Bernardo and A. F. Smith, Bayesian theory, 1994.
DOI : 10.1002/9780470316870

M. Bertoletti, N. Friel, and R. Rastelli, Choosing the number of clusters in a finite mixture model using an exact integrated completed likelihood criterion, METRON, vol.22, issue.2, pp.177-199, 2015.
DOI : 10.1007/s11222-011-9233-4

C. Biernacki, G. Celeux, and G. Govaert, Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models, Computational Statistics & Data Analysis, vol.41, issue.3-4, pp.561-575, 2003.
DOI : 10.1016/S0167-9473(02)00163-9

L. Birgé and P. Massart, Minimal penalties for Gaussian model selection. Probability theory and related fields, pp.33-73, 2007.

C. Bishop, Pattern Recognition and Machine Learning, 2006.

C. M. Bishop and P. Bayesian, Advances in neural information processing systems, pp.382-388, 1999.

C. M. Bishop, Variational principal components, 9th International Conference on Artificial Neural Networks: ICANN '99, pp.509-514, 1999.
DOI : 10.1049/cp:19991160
URL : http://www.research.microsoft.com/~cmbishop/downloads/Bishop-VPCA-ICANN-99.pdf

D. M. Blei, A. Kucukelbir, and J. D. Mcauliffe, Variational Inference: A Review for Statisticians, Journal of the American Statistical Association, vol.2, issue.518, p.2017
DOI : 10.1016/j.neuroimage.2007.04.054

S. Boucheron, G. Lugosi, and P. Massart, Concentration inequalities: A nonasymptotic theory of independence, 2013.
DOI : 10.1093/acprof:oso/9780199535255.001.0001
URL : https://hal.archives-ouvertes.fr/hal-00794821

L. Bouranis, N. Friel, and F. Maire, Bayesian model selection for exponential random graph models via adjusted pseudolikelihoods, 2017.

C. Bouveyron and C. Brunet, Simultaneous model-based clustering and visualization in the Fisher discriminative subspace, Statistics and Computing, vol.20, issue.2, pp.301-324, 2012.
DOI : 10.1016/j.csda.2006.12.046
URL : https://hal.archives-ouvertes.fr/hal-00492406

C. Bouveyron, S. Girard, and C. Schmid, High-Dimensional Discriminant Analysis, Communications in Statistics - Theory and Methods, vol.1, issue.14, pp.2607-2623, 2007.
DOI : 10.1214/aos/1176344136
URL : https://hal.archives-ouvertes.fr/inria-00548516

C. Bouveyron, S. Girard, and C. Schmid, High-dimensional data clustering, Computational Statistics & Data Analysis, vol.52, issue.1, pp.502-519, 2007.
DOI : 10.1016/j.csda.2007.02.009
URL : https://hal.archives-ouvertes.fr/inria-00548591

C. Bouveyron, G. Celeux, and S. Girard, Intrinsic dimension estimation by maximum likelihood in isotropic probabilistic PCA, Pattern Recognition Letters, vol.32, issue.14, pp.1706-1713, 2011.
DOI : 10.1016/j.patrec.2011.07.017
URL : https://hal.archives-ouvertes.fr/hal-00440372

C. Bouveyron, E. Côme, and J. Jacques, The discriminative functional mixture model for a comparative analysis of bike sharing systems, The Annals of Applied Statistics, vol.9, issue.4, pp.1726-1760, 2015.
DOI : 10.1214/15-AOAS861
URL : https://hal.archives-ouvertes.fr/hal-01024186

C. Bouveyron, P. Latouche, and P. Mattei, Bayesian variable selection for globally sparse probabilistic PCA, p.1310409, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01310409

C. Bouveyron, P. Latouche, and P. Mattei, Exact dimensionality selection for Bayesian PCA, p.1484099, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01484099

G. E. Box, Robustness in the Strategy of Scientific Model Building, Robustness in statistics, vol.1, pp.201-236, 1979.
DOI : 10.1016/B978-0-12-438150-6.50018-2

L. Breiman, Random forests, Machine Learning, vol.45, issue.1, pp.5-32, 2001.
DOI : 10.1023/A:1010933404324

L. Breiman and J. Friedman, Estimating Optimal Transformations for Multiple Regression and Correlation, Journal of the American Statistical Association, vol.41, issue.391, pp.580-598, 1985.
DOI : 10.1007/BF02296972

W. Breymann and D. Lüthi, ghyp: A package on generalized hyperbolic distributions, 2013.

R. Bro, K. Kjeldahl, A. K. Smilde, and H. A. Kiers, Cross-validation of component models: A critical look at current methods, Analytical and Bioanalytical Chemistry, vol.55, issue.5, pp.1241-1251, 2008.
DOI : 10.1007/s00216-007-1790-1

M. J. Brusco, A comparison of simulated annealing algorithms for variable selection in principal component analysis and discriminant analysis, Computational Statistics & Data Analysis, vol.77, pp.38-53, 2014.
DOI : 10.1016/j.csda.2014.03.001

R. Byrd, P. Lu, J. Nocedal, and C. Zhu, A Limited Memory Algorithm for Bound Constrained Optimization, SIAM Journal on Scientific Computing, vol.16, issue.5, pp.1190-1208, 1995.
DOI : 10.1137/0916069

E. J. Candès, Mathematics of sparsity (and a few other things), Proceedings of the International Congress of Mathematicians, 2014.

E. J. Candès and T. Tao, The Dantzig selector: Statistical estimation when p is much larger than n. The Annals of Statistics, pp.2313-2351, 2007.

E. J. Candès and T. Tao, The Power of Convex Relaxation: Near-Optimal Matrix Completion, IEEE Transactions on Information Theory, vol.56, issue.5, pp.2053-2080, 2010.
DOI : 10.1109/TIT.2010.2044061

P. Carbonetto and M. Stephens, Scalable Variational Inference for Bayesian Variable Selection in Regression, and Its Accuracy in Genetic Association Studies, Bayesian Analysis, vol.7, issue.1, pp.73-108, 2012.
DOI : 10.1214/12-BA703

B. P. Carlin and S. Chib, Bayesian model choice via Markov chain Monte Carlo methods, Journal of the Royal Statistical Society. Series B (Methodological), pp.473-484, 1995.

B. Carpenter, A. Gelman, M. Hoffman, D. Lee, B. Goodrich et al., : A Probabilistic Programming Language, Journal of Statistical Software, vol.76, issue.1, pp.1-37, 2016.
DOI : 10.18637/jss.v076.i01

O. Catoni, Pac-Bayesian Supervised Classification, Lecture Notes?Monograph Series. Institute of Mathematical Statistics, vol.56, 2007.
DOI : 10.1007/978-3-319-21852-6_20
URL : https://hal.archives-ouvertes.fr/hal-00206119

G. Celeux and G. Govaert, A classification EM algorithm for clustering and two stochastic versions, Computational Statistics & Data Analysis, vol.14, issue.3, pp.315-332, 1992.
DOI : 10.1016/0167-9473(92)90042-E
URL : https://hal.archives-ouvertes.fr/inria-00075196

G. Celeux, M. Anbari, J. Marin, and C. P. Robert, Regularization in Regression: Comparing Bayesian and Frequentist Methods in a Poorly Informative Situation, Bayesian Analysis, vol.7, issue.2, pp.477-502, 2012.
DOI : 10.1214/12-BA716
URL : https://hal.archives-ouvertes.fr/hal-00943727

J. M. Chambers, W. S. Cleveland, B. Kleiner, and P. A. Tukey, Graphical methods for data analysis, 1983.

T. Chan, K. Jia, S. Gao, J. Lu, Z. Zeng et al., PCANet: A simple deep learning baseline for image classification? Image Processing, IEEE Transactions on, vol.24, issue.12, pp.5017-5032, 2015.

W. Chang, On Using Principal Components Before Separating a Mixture of Two Multivariate Normal Distributions, Applied Statistics, vol.32, issue.3, pp.267-275, 1983.
DOI : 10.2307/2347949

D. Chatterjee, T. Maitra, and S. Bhattacharya, A short note on almost sure convergence of Bayes factors in the general set-up. arXiv preprint, 2017.

M. Chen, Q. Shao, and J. G. Ibrahim, Monte Carlo methods in Bayesian computation, 2000.
DOI : 10.1007/978-1-4612-1276-8

S. S. Chen, D. L. Donoho, and M. A. Saunders, Atomic Decomposition by Basis Pursuit, SIAM Journal on Scientific Computing, vol.20, issue.1, pp.33-61, 1998.
DOI : 10.1137/S1064827596304010

S. Chib and T. A. Kuffner, Bayes factor consistency. arXiv preprint, 2016.

M. Clyde and E. I. George, Model uncertainty, Statistical science, vol.19, pp.81-94, 2004.

C. Cortes and V. Vapnik, Support-vector networks, Machine Learning, vol.1, issue.3, pp.273-297, 1995.
DOI : 10.1007/BF00994018

K. Csilléry, M. G. Blum, O. E. Gaggiotti, and O. François, Approximate Bayesian computation (ABC) in practice. Trends in ecology & evolution, pp.410-418, 2010.

W. Cui and E. I. George, Empirical Bayes vs. fully Bayes variable selection, Journal of Statistical Planning and Inference, vol.138, issue.4, pp.888-900, 2008.
DOI : 10.1016/j.jspi.2007.02.011

A. Aspremont, F. Bach, and L. Ghaoui, Optimal solutions for sparse principal component analysis, The Journal of Machine Learning Research, vol.9, pp.1269-1294, 2008.

A. P. Dawid, Statistical theory: The prequential approach, Journal of the Royal Statistical Society. Series A, pp.278-292, 1984.

A. P. Dawid, Posterior model probabilities, Philosophy of Statistics, pp.607-630, 2011.

F. De-la-torre and T. Kanade, Discriminative cluster analysis, Proceedings of the 23rd international conference on Machine learning , ICML '06, pp.241-248, 2006.
DOI : 10.1145/1143844.1143875

C. Deledalle, J. Salmon, and A. S. Dalalyan, Image denoising with patch based PCA: local versus global, Procedings of the British Machine Vision Conference 2011, pp.25-26, 2011.
DOI : 10.5244/C.25.25
URL : https://hal.archives-ouvertes.fr/hal-00654289

P. Dellaportas, J. J. Forster, and I. Ntzoufras, Joint Specification of Model Space and Parameter Space Prior Distributions, Statistical Science, vol.27, issue.2, pp.232-246, 2012.
DOI : 10.1214/11-STS369

A. Dempster, N. Laird, and D. Rubin, Maximum likelihood for incomplete data via the EM algorithm (with discussion), Journal of the Royal Statistical Society. Series B (Methodological ), vol.39, pp.1-38, 1977.

P. J. Diggle and R. J. Gratton, Monte Carlo methods of inference for implicit statistical models, Journal of the Royal Statistical Society. Series B (Methodological), pp.193-227, 1984.

P. Ding and J. K. Blitzstein, On the Gaussian mixture representation of the Laplace distribution . The American Statistician, in press, 2017.

. Dr?ghici, Statistics and data analysis for microarrays using R and Bioconductor, 2012.

D. Draper, Assessment and propagation of model uncertainty (with discussion), Journal of the Royal Statistical Society. Series B (Methodological), pp.45-97, 1995.

D. Draper, Comment on Bayesian model averaging: A tutorial, Statistical Science, vol.14, issue.4, pp.405-409, 1999.

M. Drton and M. Plummer, A Bayesian information criterion for singular models, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.25, issue.2, pp.323-380, 2017.
DOI : 10.1080/10618600.2015.1060885

B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, Least angle regression. The Annals of statistics, pp.407-499, 2004.

T. Eltoft, T. Kim, and T. Lee, On the multivariate Laplace distribution, IEEE Signal Processing Letters, vol.13, issue.5, pp.300-303, 2006.
DOI : 10.1109/LSP.2006.870353

A. Erkanli, Laplace Approximations for Posterior Expectations When the Mode Occurs at the Boundary of the Parameter Space, Journal of the American Statistical Association, vol.41, issue.425, pp.250-258, 1994.
DOI : 10.1214/ss/1177012384

A. Etz and E. J. Wagenmakers, J. B. S. Haldane???s Contribution to the Bayes Factor Hypothesis Test, Statistical Science, vol.32, issue.2, pp.313-329, 2017.
DOI : 10.1214/16-STS599

N. Evangelopoulos, X. Zhang, and V. R. Prybutok, Latent Semantic Analysis: five methodological recommendations, European Journal of Information Systems, vol.51, issue.2, pp.70-86, 2012.
DOI : 10.1016/j.csda.2005.09.010

J. Fan and J. Lv, Sure independence screening for ultrahigh dimensional feature space, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.36, issue.5, pp.849-911, 2008.
DOI : 10.1255/jnirs.271

K. Fang, S. Kotz, and K. W. Ng, Symmetric multivariate and related distributions, 1990.
DOI : 10.1007/978-1-4899-2937-2

R. A. Fisher, Presidential address. Sankhy?: The Indian Journal of Statistics, pp.14-17, 1938.

M. Fop and T. B. Murphy, Variable selection methods for model-based clustering. arXiv preprint, 2017.

C. Fraley and A. E. Raftery, Model-Based Clustering, Discriminant Analysis, and Density Estimation, Journal of the American Statistical Association, vol.97, issue.458, pp.611-631, 2002.
DOI : 10.1198/016214502760047131

B. C. Franczak, R. P. Browne, and P. D. Mcnicholas, Mixtures of Shifted AsymmetricLaplace Distributions, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.36, issue.6, pp.1149-1157, 2014.
DOI : 10.1109/TPAMI.2013.216

N. Friel and A. N. Pettitt, Marginal likelihood estimation via power posteriors, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.57, issue.3, pp.589-607, 2008.
DOI : 10.1002/0470011084
URL : http://www.stats.gla.ac.uk/research/TechRep2005/05.10.pdf

N. Friel and J. Wyse, Estimating the evidence - a review, Statistica Neerlandica, vol.6, issue.3, pp.288-308, 2012.
DOI : 10.1214/11-BA620

N. Friel, J. P. Mckeone, C. J. Oates, and A. N. Pettitt, Investigation of the widely applicable Bayesian information criterion, Statistics and Computing, vol.14, issue.1, pp.833-844, 2017.
DOI : 10.1111/j.0006-341X.2000.00256.x

W. Fu, Penalized regressions: the bridge versus the lasso, Journal of computational and graphical statistics, vol.7, issue.3, pp.397-416, 1998.

T. Gao and V. Jojic, Degrees of freedom in deep neural networks, Proceedings of the Conference on Uncertainty in Artificial Intelligence, 2016.

R. E. Gaunt, Variance-Gamma approximation via Stein's method, Electronic Journal of Probability, vol.19, issue.0, pp.1-33, 2014.
DOI : 10.1214/EJP.v19-3020
URL : http://doi.org/10.1214/ejp.v19-3020

M. Gavish and D. L. Donoho, The Optimal Hard Threshold for Singular Values is <inline-formula> <tex-math notation="TeX">$4/\sqrt {3}$ </tex-math></inline-formula>, IEEE Transactions on Information Theory, vol.60, issue.8, pp.5040-5053, 2014.
DOI : 10.1109/TIT.2014.2323359

A. Gelman and C. R. Shalizi, Philosophy and the practice of Bayesian statistics, British Journal of Mathematical and Statistical Psychology, vol.1, issue.1, pp.8-38, 2013.
DOI : 10.1017/CBO9780511541391

E. I. George and D. Foster, Calibration and empirical Bayes variable selection, Biometrika, vol.87, issue.4, pp.731-747, 2000.
DOI : 10.1093/biomet/87.4.731
URL : http://diskworld.wharton.upenn.edu/research/./EBVS00.pdf

E. I. George and R. E. Mcculloch, Variable Selection via Gibbs Sampling, Journal of the American Statistical Association, vol.36, issue.423, pp.881-889, 1993.
DOI : 10.1007/BF01889985

P. Germain, F. Bach, A. Lacoste, and S. Lacoste-julien, PAC-Bayesian theory meets Bayesian inference, Advances in Neural Information Processing Systems, pp.1884-1892, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01324072

T. Gneiting and A. E. Raftery, Strictly Proper Scoring Rules, Prediction, and Estimation, Journal of the American Statistical Association, vol.102, issue.477, pp.359-378, 2007.
DOI : 10.1198/016214506000001437

I. J. Good, Rational Decisions, Journal of the Royal Statistical Society. Series B (Methodological ), pp.107-114, 1952.
DOI : 10.1007/978-1-4612-0919-5_24

I. J. Good, Studies in the History of Probability and Statistics. XXXVII A. M. Turing's statistical work in World War II, Biometrika, vol.66, issue.2, pp.393-396, 1979.
DOI : 10.1093/biomet/66.2.393

I. Goodfellow, J. Pouget-abadie, M. Mirza, B. Xu, D. Warde-farley et al., Generative adversarial nets, Advances in neural information processing systems, pp.2672-2680, 2014.

I. Goodfellow, Y. Bengio, and A. Courville, Deep learning, 2016.

A. Gramfort, D. Strohmeier, J. Haueisen, M. S. Hämäläinen, and M. Kowalski, Time-frequency mixed-norm estimates: Sparse M/EEG imaging with non-stationary source activations, NeuroImage, vol.70, pp.410-422, 2013.
DOI : 10.1016/j.neuroimage.2012.12.051
URL : https://hal.archives-ouvertes.fr/hal-00773276

Y. Grandvalet, J. Chiquet, and C. Ambroise, Sparsity by worst-case quadratic penalties. arXiv preprint arXiv, 1210.

P. J. Green, Reversible jump Markov chain Monte Carlo computation and Bayesian model determination, Biometrika, vol.82, issue.4, pp.711-732, 1995.
DOI : 10.1093/biomet/82.4.711

Q. Gu, Z. Li, and J. Han, Joint feature selection and subspace learning, Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), p.1294, 2011.

Y. Guan and J. G. Dy, Sparse probabilistic principal component analysis, International Conference on Artificial Intelligence and Statistics, pp.185-192, 2009.

I. Guyon, A. Saffari, G. Dror, and G. Cawley, Model selection: Beyond the Bayesian/frequentist divide, The Journal of Machine Learning Research, vol.11, pp.61-87, 2010.

J. B. Haldane, A note on inverse probability, Mathematical Proceedings of the Cambridge Philosophical Society, pp.55-61, 1932.
DOI : 10.1098/rsta.1922.0009

C. Han and B. P. Carlin, Markov Chain Monte Carlo Methods for Computing Bayes Factors, Journal of the American Statistical Association, vol.96, issue.455, pp.1122-1132, 2001.
DOI : 10.1198/016214501753208780

A. Hannachi, I. T. Jolliffe, D. B. Stephenson, and N. Trendafilov, In search of simple structures in climate: simplifying EOFs, International Journal of Climatology, vol.11, issue.1, pp.7-28, 2006.
DOI : 10.1017/CBO9780511612336

P. Hartman and G. S. Watson, Normal " distribution functions on spheres and the modified Bessel functions. The Annals of Probability, pp.593-607, 1974.

D. I. Hastie and P. J. Green, Model choice using reversible jump Markov chain Monte Carlo, Statistica Neerlandica, vol.69, issue.3, pp.309-338, 2012.
DOI : 10.1111/j.1751-5823.2001.tb00479.x

T. Hastie, R. Tibshirani, and M. Wainwright, Statistical learning with sparsity: the lasso and generalizations, 2015.

D. Haughton, On the choice of a model to fit data from an exponential family. The Annals of Statistics, pp.342-355, 1988.

K. He, X. Zhang, S. Ren, and J. Sun, Deep Residual Learning for Image Recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.770-778, 2016.
DOI : 10.1109/CVPR.2016.90

S. Hee, W. J. Handley, M. P. Hobson, and A. N. Lasenby, Bayesian model selection without evidences: application to the dark energy equation-of-state, Monthly Notices of the Royal Astronomical Society, vol.455, issue.3, pp.2461-2473, 2016.
DOI : 10.1093/mnras/stv2217

D. Hernández-lobato, J. M. Hernández-lobato, and P. Dupont, Generalized spike-and-slab priors for Bayesian group feature selection using expectation propagation, The Journal of Machine Learning Research, vol.14, issue.1, pp.1891-1945, 2013.

J. A. Hoeting, D. Madigan, A. E. Raftery, and C. T. Volinsky, Bayesian model averaging: a tutorial (with discussion), Statistical Science, pp.382-401, 1999.

P. D. Hoff, Model Averaging and Dimension Selection for the Singular Value Decomposition, Journal of the American Statistical Association, vol.102, issue.478, pp.674-685, 2007.
DOI : 10.1198/016214506000001310

C. C. Holmes, F. Caron, J. E. Griffin, and D. A. Stephens, Two-sample Bayesian Nonparametric Hypothesis Testing, Bayesian Analysis, vol.10, issue.2, pp.297-320, 2015.
DOI : 10.1214/14-BA914
URL : https://hal.archives-ouvertes.fr/hal-00733547

H. Hotelling, Analysis of a complex of statistical variables into principal components., Journal of Educational Psychology, vol.24, issue.6, p.417, 1933.
DOI : 10.1037/h0071325

D. C. Hoyle, Automatic PCA dimension selection for high dimensional data and small sample sizes, Journal of Machine Learning Research, vol.9, pp.2733-2759, 2008.

C. K. Hsiao, Approximate Bayes Factors When a Mode Occurs on the Boundary, Journal of the American Statistical Association, vol.31, issue.438, pp.656-663, 1997.
DOI : 10.2307/2347977

X. Huang, J. Wang, and F. Liang, A variational algorithm for Bayesian variable selection. arXiv preprint, 2016.

L. Hubert and P. Arabie, Comparing partitions, Journal of Classification, vol.78, issue.1, pp.193-218, 1985.
DOI : 10.1007/978-3-642-69024-2_27

F. Huszár, Variational inference using implicit distributions. arXiv preprint, 2017.

A. Ilin and T. Raiko, Practical approaches to principal component analysis in the presence of missing values, The Journal of Machine Learning Research, vol.11, 1957.

H. Ishwaran and J. Rao, Spike and Slab Gene Selection for Multigroup Microarray Data, Journal of the American Statistical Association, vol.100, issue.471, pp.764-780, 2005.
DOI : 10.1198/016214505000000051

H. Ishwaran and J. Rao, Spike and slab variable selection: frequentist and bayesian strategies . The Annals of Statistics, pp.730-773, 2005.
DOI : 10.1214/009053604000001147
URL : http://doi.org/10.1214/009053604000001147

H. Ishwaran, U. Kogalur, and J. Rao, spikeslab: Prediction and variable selection using spike and slab regression, Journal, vol.2, issue.2, p.2010

D. A. Jackson, Stopping Rules in Principal Components Analysis: A Comparison of Heuristical and Statistical Approaches, Ecology, vol.74, issue.8, pp.2204-2214, 1993.
DOI : 10.2307/1939574

W. H. Jefferys and J. O. Berger, Ockham's razor and Bayesian analysis, American Scientist, vol.80, issue.1, pp.64-72, 1992.

H. Jeffreys, Theory of Probability, 1939.

H. Jeffreys, Theory of Probability, 1961.

R. Jenatton, G. Obozinski, and F. Bach, Structured sparse principal component analysis, International Conference on Artificial Intelligence and Statistics, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00414158

R. Jenatton, J. Audibert, and F. Bach, Structured variable selection with sparsityinducing norms, Journal of Machine Learning Research, vol.12, pp.2777-2824, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00377732

V. Johnson and D. , Bayesian Model Selection in High-Dimensional Settings, Journal of the American Statistical Association, vol.67, issue.498, pp.649-660, 2012.
DOI : 10.1111/j.1467-9868.2005.00503.x

I. M. Johnstone and A. Y. Lu, On Consistency and Sparsity for Principal Components Analysis in High Dimensions, Journal of the American Statistical Association, vol.104, issue.486, 2009.
DOI : 10.1198/jasa.2009.0121

I. T. Jolliffe, Discarding Variables in a Principal Component Analysis. I: Artificial Data, Applied statistics, pp.160-173, 1972.
DOI : 10.2307/2346488

I. T. Jolliffe, Discarding Variables in a Principal Component Analysis. II: Real Data, Applied Statistics, vol.22, issue.1, pp.21-31, 1973.
DOI : 10.2307/2346300

I. T. Jolliffe, Principal component analysis, 2002.

I. T. Jolliffe and J. Cadima, Principal component analysis: a review and recent developments, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol.58, issue.2065, pp.374-2016, 2065.
DOI : 10.1016/j.csda.2008.05.028

M. I. Jordan, What are the open problems in Bayesian statistics? The ISBA Bulletin, p.568, 2011.

J. Josse and F. Husson, Selecting the number of components in principal component analysis using cross-validation approximations, Computational Statistics & Data Analysis, vol.56, issue.6, pp.1869-1879, 2012.
DOI : 10.1016/j.csda.2011.11.012

M. Journée, Geometric algorithms for component analysis with a view to gene expression data analysis, 2009.

M. Journée, Y. Nesterov, P. Richtárik, and R. Sepulchre, Generalized power method for sparse principal component analysis, The Journal of Machine Learning Research, vol.11, pp.517-553, 2010.

K. Kamary, K. Mengersen, C. P. Robert, and J. Rousseau, Testing hypotheses via a mixture estimation model. arXiv preprint arXiv:1412, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01517681

R. E. Kass and A. E. Raftery, Bayes Factors, Journal of the American Statistical Association, vol.2, issue.430, pp.773-795, 1995.
DOI : 10.1214/ss/1177013241

R. E. Kass and D. Steffey, Approximate Bayesian Inference in Conditionally Independent Hierarchical Models (Parametric Empirical Bayes Models), Journal of the American Statistical Association, vol.34, issue.407, pp.717-726, 1989.
DOI : 10.1080/01621459.1989.10478736

R. E. Kass and L. Wasserman, A Reference Bayesian Test for Nested Hypotheses and its Relationship to the Schwarz Criterion, Journal of the American Statistical Association, vol.41, issue.431, pp.90928-934, 1995.
DOI : 10.1214/aos/1176344136

R. E. Kass, L. Tierney, and J. B. Kadane, The validity of posterior expansions based on Laplace's method, Bayesian and likelihood methods in statistics and econometrics, pp.473-488, 1990.

J. M. Keynes, A Treatise on Probability, 1921.
DOI : 10.1007/978-1-349-00843-8

Z. Khan, F. Shafait, and A. Mian, Joint Group Sparse PCA for Compressed Hyperspectral Imaging, IEEE Transactions on Image Processing, vol.24, issue.12, pp.4934-4942, 2015.
DOI : 10.1109/TIP.2015.2472280

R. Khanna, J. Ghosh, R. Poldrack, and O. Koyejo, Sparse submodular probabilistic PCA, Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, pp.453-461, 2015.

S. Kotz, T. J. Kozubowski, and K. Podgorski, The Laplace distribution and generalizations: a revisit with applications to communications, exonomics, engineering, and finance. Number 183, 2001.
DOI : 10.1007/978-1-4612-0173-1

T. J. Kozubowski, K. Podgórski, and I. Rychlik, Multivariate generalized Laplace distribution and related random fields, Journal of Multivariate Analysis, vol.113, pp.59-72, 2013.
DOI : 10.1016/j.jmva.2012.02.010
URL : https://doi.org/10.1016/j.jmva.2012.02.010

N. Kraemer, J. Schaefer, and A. Boulesteix, Regularized estimation of large-scale gene regulatory networks using gaussian graphical models, BMC Bioinformatics, issue.384, p.10, 2009.

A. Kucukelbir, D. Tran, R. Ranganath, A. Gelman, and D. M. Blei, Automatic differentiation variational inference, Journal of Machine Learning Research, vol.18, issue.14, pp.1-45, 2017.

J. Kuha, AIC and BIC, Sociological Methods & Research, vol.57, issue.2, pp.188-229, 2004.
DOI : 10.2307/2096242

A. Kyprianou, Fluctuations of Lévy processes with applications: Introductory Lectures, 2014.
DOI : 10.1007/978-3-642-37632-0

H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio, An empirical evaluation of deep architectures on problems with many factors of variation, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.473-480, 2007.
DOI : 10.1145/1273496.1273556

P. Latouche, E. Birmelé, and C. Ambroise, Variational Bayesian inference and complexity control for stochastic block models, Statistical Modelling: An International Journal, vol.41, issue.1, pp.93-115, 2012.
DOI : 10.1016/j.patcog.2008.06.019
URL : https://hal.archives-ouvertes.fr/hal-00624536

P. Latouche, E. Birmelé, and C. Ambroise, Model selection in overlapping stochastic block models, Electronic Journal of Statistics, vol.8, issue.1, pp.762-794, 2014.
DOI : 10.1214/14-EJS903
URL : https://hal.archives-ouvertes.fr/hal-00990277

P. Latouche, P. Mattei, C. Bouveyron, and J. Chiquet, Combining a relaxed EM algorithm with Occam???s razor for Bayesian variable selection in high-dimensional regression, Journal of Multivariate Analysis, vol.146, pp.177-190, 2016.
DOI : 10.1016/j.jmva.2015.09.004

M. Lavine and M. J. Schervish, Bayes factors: what they are and what they are not. The American Statistician, pp.119-122, 1999.
DOI : 10.2307/2685729
URL : ftp://ftp.isds.duke.edu/pub/WorkingPapers/97-02.ps

D. N. Lawley, A modified method of estimation in factor analysis and some large sample results, Proceedings of the Uppsala Symposium on Psychological Factor Analysis, pp.35-42, 1953.

M. Lázaro-gredilla and M. K. Titsias, Spike and slab variational inference for multi-task and multiple kernel learning, Advances in neural information processing systems, pp.2339-2347, 2011.

Y. Lecun, Y. Bengio, and G. Hinton, Deep learning, Nature, vol.9, issue.7553, pp.436-444, 2015.
DOI : 10.1007/s10994-013-5335-x

M. Ledoux, The concentration of measure phenomenon, p.140, 2001.
DOI : 10.1090/surv/089

M. Legendre, Nouvelles méthodes pour la détermination des orbites des comètes. F. Didot, p.1805

F. Liang, R. Paulo, G. Molina, M. A. Clyde, and J. O. Berger, Priors for Bayesian Variable Selection, Journal of the American Statistical Association, vol.103, issue.481, pp.410-423, 2008.
DOI : 10.1198/016214507000001337

B. Liebmann, A. Friedl, and K. Varmuza, Determination of glucose and ethanol in bioethanol production by near infrared spectroscopy and chemometrics, Analytica Chimica Acta, vol.642, issue.1-2, pp.171-178, 2009.
DOI : 10.1016/j.aca.2008.10.069

S. Lin, B. Sturmfels, and Z. Xu, Marginal likelihood integrals for mixtures of independence models, Journal of Machine Learning Research, vol.10, pp.1611-1631, 2009.

D. V. Lindley, A STATISTICAL PARADOX, Biometrika, vol.44, issue.1-2, pp.187-192, 1957.
DOI : 10.1093/biomet/44.1-2.187

D. V. Lindley, Some comments on Bayes factors, Journal of Statistical Planning and Inference, vol.61, issue.1, pp.181-189, 1997.
DOI : 10.1016/S0378-3758(96)00189-9

T. Liu, L. Trinchera, A. Tenenhaus, D. Wei, and A. O. Hero, Globally Sparse PLS Regression, New Perspectives in Partial Least Squares and Related Methods, pp.117-127, 2013.
DOI : 10.1007/978-1-4614-8283-3_7
URL : https://hal.archives-ouvertes.fr/hal-01069009

B. A. Logsdon, G. E. Hoffman, and J. G. Mezey, A variational Bayes algorithm for fast and accurate multiple locus genome-wide association analysis, BMC Bioinformatics, vol.11, issue.1, p.58, 2010.
DOI : 10.1186/1471-2105-11-58

H. F. Lopes and M. West, Bayesian model assessment in factor analysis, Statistica Sinica, pp.41-67, 2004.

L. Lorch, Inequalities for some Whittaker functions, Archivum Mathematicum, vol.3, issue.1, pp.1-9, 1967.

M. Luck, G. Bertho, M. Bateson, A. Karras, A. Yartseva et al., Rule-Mining for the Early Prediction of Chronic Kidney Disease Based on Metabolomics and Multi-Source Data, PLOS ONE, vol.84, issue.12, pp.11-0166905, 2016.
DOI : 10.1371/journal.pone.0166905.s002

D. J. Mackay, Bayesian methods for adaptive models, 1991.

D. J. Mackay, Bayesian Interpolation, Neural Computation, vol.49, issue.3, pp.415-447, 1992.
DOI : 10.1093/comjnl/11.2.185

D. J. Mackay, A Practical Bayesian Framework for Backpropagation Networks, Neural Computation, vol.4, issue.3, pp.448-472, 1992.
DOI : 10.1038/323533a0

D. J. Mackay, Bayesian Methods for Backpropagation Networks, Models of neural networks III, pp.211-254, 1994.
DOI : 10.1007/978-1-4612-0723-8_6

J. C. Mackay, Comparison of Approximate Methods for Handling Hyperparameters, Neural Computation, vol.39, issue.5, pp.1035-1068, 1999.
DOI : 10.1007/BF01437407

D. J. Mackay, Information theory, inference and learning algorithms, 2003.

D. B. Madan, P. P. Carr, and E. C. Chang, The Variance Gamma Process and Option Pricing, Review of Finance, vol.2, issue.1, pp.79-105, 1998.
DOI : 10.1023/A:1009703431535

D. Madigan and A. E. Raftery, Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occam's Window, Journal of the American Statistical Association, vol.52, issue.428, pp.1535-1546, 1994.
DOI : 10.2307/2529496

M. Mäechler, Bessel: Bessel ? Bessel Functions Computations and Approximations, 2013. URL https://CRAN.R-project.org/package=Bessel. R package version 0, pp.5-5

A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, and B. Frey, Adversarial autoencoders, 2015.

M. Marbac and M. Sedki, Variable selection for model-based clustering using the integrated complete-data likelihood, Statistics and Computing, vol.1, issue.490, pp.1049-1063, 2017.
DOI : 10.1198/jasa.2010.tm09415
URL : https://hal.archives-ouvertes.fr/hal-01108878

J. Marin and C. P. Robert, Importance sampling methods for Bayesian discrimination between embedded models. Frontiers of Statistical Decision Making and Bayesian Analysis, pp.513-527, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00424475

J. Marin and C. P. Robert, Bayesian essentials with R, 2014.
DOI : 10.1007/978-1-4614-8687-9
URL : https://hal.archives-ouvertes.fr/hal-01337395

J. Marin, P. Pudlo, C. P. Robert, and R. J. Ryder, Approximate Bayesian computational methods, Statistics and Computing, vol.6, issue.31, pp.1-14, 2012.
DOI : 10.1098/rsif.2008.0172
URL : https://hal.archives-ouvertes.fr/hal-00567240

J. Marin, P. Pudlo, A. Estoup, and C. P. Robert, Likelihood-free model choice, 2015.
URL : https://hal.archives-ouvertes.fr/hal-00660474

H. Markowitz, Portfolio selection. The journal of Finance, pp.77-91, 1952.

M. Masaeli, Y. Yan, Y. Cui, G. Fung, and J. G. Dy, Convex Principal Feature Selection, SIAM International Conference on Data Mining, pp.619-628, 2010.
DOI : 10.1137/1.9781611972801.54
URL : http://www.siam.org/proceedings/datamining/2010/dm10_054_masaelim.pdf

P. Mattei, Multiplying a Gaussian matrix by a Gaussian vector, Statistics & Probability Letters, vol.128, pp.67-70, 2017.
DOI : 10.1016/j.spl.2017.04.004
URL : https://hal.archives-ouvertes.fr/hal-01462941

P. Mattei, Discussion on " a Bayesian information criterion for singular models, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.79, issue.2, pp.370-371, 2017.

C. Maugis, G. Celeux, and M. Martin-magniette, Variable selection in model-based discriminant analysis, Journal of Multivariate Analysis, vol.102, issue.10, pp.1374-1387, 2011.
DOI : 10.1016/j.jmva.2011.05.004
URL : https://hal.archives-ouvertes.fr/inria-00483229

D. G. Mayo and A. Spanos, Error and inference, 2009.
DOI : 10.1017/CBO9780511657528

R. Mazumder and P. Radchenko, The Discrete Dantzig Selector: Estimating Sparse Linear Models via Mixed Integer Linear Optimization, IEEE Transactions on Information Theory, vol.63, issue.5, pp.3053-3075, 2017.
DOI : 10.1109/TIT.2017.2658023

D. A. Mcallester, Some PAC-Bayesian theorems, Proceedings of the eleventh annual conference on Computational learning theory , COLT' 98, pp.230-234, 1998.
DOI : 10.1145/279943.279989

J. Mcelhinney, G. Downey, and T. Fearn, Chemometric processing of visible and near infrared reflectance spectra for species identification in selected raw homogenised meats, Journal of Near Infrared Spectroscopy, vol.7, issue.1, pp.145-154, 1999.
DOI : 10.1255/jnirs.245

D. Mcgraw and J. Wagner, Elliptically symmetric distributions, IEEE Transactions on Information Theory, vol.14, issue.1, pp.110-120, 1968.
DOI : 10.1109/TIT.1968.1054081

G. Mclachlan and T. Krishnan, The EM Algorithm and Extensions. Second Edition, 2008.

S. M. Mcnicholas, P. D. Mcnicholas, and R. P. Browne, Mixtures of variance-gamma distributions. arXiv preprint, 2013.

N. Meinshausen and P. Bühlmann, Stability selection, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.7, issue.4, 2010.
DOI : 10.1186/1471-2105-9-307
URL : http://onlinelibrary.wiley.com/doi/10.1111/j.1467-9868.2010.00740.x/pdf

J. Mencía and E. Sentana, Multivariate Location-Scale Mixtures of Normals and Mean-Variance-Skewness Portfolio Allocation, SSRN Electronic Journal, vol.153, issue.2, pp.105-121, 2009.
DOI : 10.2139/ssrn.1413060

J. A. Miller, C. Cai, P. Langfelder, D. H. Geschwind, S. M. Kurian et al., Strategies for aggregating gene expression data: The collapseRows R function, BMC Bioinformatics, vol.12, issue.1, p.1, 2011.
DOI : 10.1093/bioinformatics/btl163

T. P. Minka, Automatic choice of dimensionality for PCA, NIPS, pp.598-604, 2000.

T. P. Minka, Expectation propagation for approximate Bayesian inference, Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence, pp.362-369

A. J. Minn, G. P. Gupta, D. Padua, P. Bos, D. X. Nguyen et al., Lung metastasis genes couple breast tumor size and metastatic spread, Proceedings of the National Academy of Sciences, pp.6740-6745, 2007.
DOI : 10.1038/nature05760
URL : http://www.pnas.org/content/104/16/6740.full.pdf

J. Mitchell and . Beauchamp, Bayesian Variable Selection in Linear Regression, Journal of the American Statistical Association, vol.51, issue.404, pp.1023-1036, 1988.
DOI : 10.1093/biomet/51.1-2.219

B. Moghaddam, Y. Weiss, and S. Avidan, Spectral bounds for sparse PCA: Exact and greedy algorithms, Advances in neural information processing systems, pp.915-922, 2005.

S. Mohamed, K. Heller, and Z. Ghahramani, Bayesian and l1 approaches for sparse unsupervised learning, Proceedings of the 29th International Conference on Machine Learning, pp.751-758, 2012.

E. Moreno, J. Girón, and G. Casella, Posterior Model Consistency in Variable Selection as the Model Dimension Grows, Statistical Science, vol.30, issue.2, pp.228-241, 2015.
DOI : 10.1214/14-STS508

K. P. Murphy, Conjugate Bayesian analysis of the Gaussian distribution, 2007.

T. B. Murphy, N. Dean, and A. E. Raftery, Variable selection and updating in model-based discriminant analysis for high dimensional data with food authenticity applications, The Annals of Applied Statistics, vol.4, issue.1, p.396, 2010.
DOI : 10.1214/09-AOAS279SUPP

I. Murray and Z. Ghahramani, A note on the evidence and Bayesian Occam's razor, 2005.

S. Nakajima, M. Sugiyama, and D. Babacan, On Bayesian PCA: Automatic dimensionality selection and analytic solution, Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp.497-504, 2011.

N. N. Narisetty and X. He, Bayesian variable selection with shrinking and diffusing priors. The Annals of Statistics, pp.789-817, 2014.
DOI : 10.1214/14-aos1207
URL : http://doi.org/10.1214/14-aos1207

B. Natarajan, Sparse Approximate Solutions to Linear Systems, SIAM Journal on Computing, vol.24, issue.2, pp.227-234, 1995.
DOI : 10.1137/S0097539792240406

R. M. Neal, Bayesian Learning for Neural Networks, 1996.
DOI : 10.1007/978-1-4612-0745-0

R. M. Neal, Annealed importance sampling, Statistics and Computing, vol.11, issue.2, pp.125-139, 2001.
DOI : 10.1023/A:1008923215028

A. Njato-randriamanamihaga, E. Côme, L. Oukhellou, and G. Govaert, Clustering the vélib'dynamic origin/destination flows using a family of poisson mixture models, Neurocomputing, 2014.

H. Ogata, A numerical integration formula based on the Bessel functions, Publications of the Research Institute for Mathematical Sciences, vol.41, issue.4, pp.949-970, 2005.
DOI : 10.2977/prims/1145474602

A. O. Hagan, Fractional Bayes factors for model comparison, Journal of the Royal Statistical Society. Series B (Methodological), pp.99-138, 1995.

R. O. Hara and M. Sillanpää, A review of Bayesian variable selection methods: what, how and which, Bayesian Analysis, vol.4, issue.1, pp.85-117, 2009.
DOI : 10.1214/09-BA403SUPP

. Oppenheim, Inequalities Connected with Definite Hermitian Forms, Journal of the London Mathematical Society, vol.1, issue.2, pp.114-119, 1930.
DOI : 10.1112/jlms/s1-5.2.114

T. Park and G. Casella, The Bayesian Lasso, Journal of the American Statistical Association, vol.103, issue.482, pp.681-686, 2008.
DOI : 10.1198/016214508000000337

D. Passemier, Z. Li, and J. Yao, On estimation of the noise variance in high dimensional probabilistic principal component analysis, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.48, issue.1, pp.51-67, 2017.
DOI : 10.1214/11-AIHP414
URL : https://hal.archives-ouvertes.fr/hal-00851783

D. K. Pauler, J. C. Wakefield, and R. E. Kass, Bayes Factors and Approximations for Variance Component Models, Journal of the American Statistical Association, vol.59, issue.448, pp.941242-1253, 1999.
DOI : 10.1006/jmva.1995.1080

S. Petrone, J. Rousseau, and C. Scricciolo, Bayes and empirical Bayes: do they merge?, Biometrika, vol.101, issue.2, 2014.
DOI : 10.1093/biomet/ast067
URL : https://hal.archives-ouvertes.fr/hal-00767467

J. Piironen and A. Vehtari, Comparison of Bayesian predictive methods for model selection, Statistics and Computing, vol.11, issue.2, pp.711-735, 2016.
DOI : 10.1017/CBO9780511800474

M. Plummer, Penalized loss functions for Bayesian model comparison, Biostatistics, vol.9, issue.3, pp.523-539, 2008.
DOI : 10.1093/biostatistics/kxm049

B. Pötscher and H. Leeb, On the Distribution of Penalized Maximum Likelihood Estimators: The LASSO, SCAD, and Thresholding, SSRN Electronic Journal, vol.100, issue.9, pp.2065-2082, 2009.
DOI : 10.2139/ssrn.1027629

P. Pudlo, J. Marin, A. Estoup, J. Cornuet, M. Gautier et al., Reliable ABC model choice via random forests, Bioinformatics, vol.32, issue.6, pp.859-866, 2015.
DOI : 10.1093/bioinformatics/btv684
URL : https://hal.archives-ouvertes.fr/hal-01067925

Y. Qiu and J. Mei, RSpectra: Solvers for Large Scale Eigenvalue and SVD Problems, 2016. URL https://CRAN.R-project.org/package=RSpectra. R package version 0, pp.12-12

A. E. Raftery, Bayesian Model Selection in Social Research, Sociological Methodology, vol.25, pp.111-163, 1995.
DOI : 10.2307/271063

A. E. Raftery, Bayes Factors and BIC, Sociological Methods & Research, vol.57, issue.3, pp.411-427, 1999.
DOI : 10.2307/2096242

A. E. Raftery and Y. Zheng, Discussion, Journal of the American Statistical Association, vol.98, issue.464, pp.931-938, 2003.
DOI : 10.1198/016214503000000891

A. E. Raftery, D. Madigan, and C. T. Volinsky, Accounting for model uncertainty in survival analysis improves predictive performance (with discussion) Bayesian statistics, pp.323-349, 1996.

A. E. Raftery, T. Gneiting, F. Balabdaoui, and M. Polakowski, Using Bayesian Model Averaging to Calibrate Forecast Ensembles, Monthly Weather Review, vol.133, issue.5, pp.1155-1174, 2005.
DOI : 10.1175/MWR2906.1

C. E. Rasmussen and Z. Ghahramani, Occam's razor Advances in neural information processing systems, pp.294-300, 2001.

P. Rigollet and A. Tsybakov, Exponential screening and optimal rates of sparse estimation. The Annals of Statistics, pp.731-771, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00606059

M. Ringnér, What is principal component analysis? Nature biotechnology, pp.303-304, 2008.

I. Rivals, L. Personnaz, L. Taing, and M. Potier, Enrichment or depletion of a GO category within a class of genes: which test?, Bioinformatics, vol.23, issue.4, pp.401-407, 2007.
DOI : 10.1093/bioinformatics/btl633
URL : https://hal.archives-ouvertes.fr/hal-00801557

C. Robert and G. Casella, Monte Carlo statistical methods, 2004.

C. P. Robert, A note on Jeffreys-Lindley paradox, Statistica Sinica, vol.3, issue.2, pp.601-608, 1993.

C. P. Robert, On the Jeffreys-Lindley Paradox, Philosophy of Science, vol.81, issue.2, pp.216-232, 2014.
DOI : 10.1086/675729
URL : https://hal.archives-ouvertes.fr/hal-00853651

C. P. Robert, The expected demise of the Bayes factor, Journal of Mathematical Psychology, vol.72, pp.33-37, 2016.
DOI : 10.1016/j.jmp.2015.08.002
URL : https://hal.archives-ouvertes.fr/hal-01409264

C. P. Robert and D. Wraith, Computational methods for Bayesian model choice, The 29th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering. AIP Conference Proceedings, pp.251-262, 2009.
DOI : 10.1063/1.3275622
URL : https://hal.archives-ouvertes.fr/hal-00408203

C. P. Robert, N. Chopin, and J. Rousseau, Harold Jeffreys???s Theory of Probability Revisited, Statistical Science, vol.24, issue.2, pp.141-172, 2009.
DOI : 10.1214/09-STS284

P. Robert and Y. Escoufier, A Unifying Tool for Linear Multivariate Statistical Methods: The RV- Coefficient, Applied Statistics, vol.25, issue.3, pp.257-265, 1976.
DOI : 10.2307/2347233

V. Ro?ková and E. George, EMVS: The EM Approach to Bayesian Variable Selection, Journal of the American Statistical Association, vol.58, issue.506, pp.just-accepted, 2013.
DOI : 10.1561/2200000001

G. Rota, The Number of Partitions of a Set, The American Mathematical Monthly, vol.71, issue.5, pp.498-504, 1964.
DOI : 10.2307/2312585

S. Roweis and S. , Advances in neural information processing systems, pp.626-632, 1998.

L. Sagun, U. Evci, V. U. Guney, Y. Dauphin, and L. Bottou, Empirical analysis of the Hessian of over-parametrized neural networks. arXiv preprint, 2017.

J. Salvatier, T. V. Wiecki, and C. Fonnesbeck, Figure 7: Posterior distributions and traces from disasters change point model., PeerJ Computer Science, vol.13, issue.1, p.55, 2016.
DOI : 10.7717/peerj-cs.55/fig-7

R. Schaback and Z. Wu, Operators on radial functions, Journal of Computational and Applied Mathematics, vol.73, issue.1-2, pp.257-270, 1996.
DOI : 10.1016/0377-0427(96)00047-7
URL : https://doi.org/10.1016/0377-0427(96)00047-7

T. E. Scheetz, K. A. Kim, R. E. Swiderski, A. R. Philp, T. A. Braun et al., Regulation of gene expression in the mammalian eye and its relevance to eye disease, Proceedings of the National Academy of Sciences, pp.14429-14434, 2006.
DOI : 10.1111/1467-9868.00346

M. Schroeder, B. Haibe-kains, A. Culhane, C. Sotiriou, G. Bontempi et al., breastCancerVDX: Gene expression datasets published by, 2005.

G. Schwarz, Estimating the Dimension of a Model, The Annals of Statistics, vol.6, issue.2, pp.461-464, 1978.
DOI : 10.1214/aos/1176344136

D. W. Scott and J. R. Thompson, Probability density estimation in higher dimensions, Computer Science and Statistics: Proceedings of the fifteenth symposium on the interface, pp.173-179, 1983.

J. Scott and J. Berger, Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem, The Annals of Statistics, vol.38, issue.5, pp.2587-2619, 2010.
DOI : 10.1214/10-AOS792

M. Seeger, Bayesian Gaussian process models: PAC-Bayesian generalisation error bounds and sparse approximations, 2003.
DOI : 10.1162/153244303765208386

J. Shawe-taylor and R. C. Williamson, A PAC analysis of a Bayesian estimator, Proceedings of the tenth annual conference on Computational learning theory , COLT '97, pp.2-9, 1997.
DOI : 10.1145/267460.267466

H. Shen and J. Z. Huang, Sparse principal component analysis via regularized low rank matrix approximation, Journal of Multivariate Analysis, vol.99, issue.6, pp.1015-1034, 2008.
DOI : 10.1016/j.jmva.2007.06.007
URL : https://doi.org/10.1016/j.jmva.2007.06.007

C. D. Sigg and J. M. Buhmann, Expectation-maximization for sparse and non-negative PCA, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.960-967, 2008.
DOI : 10.1145/1390156.1390277

B. Silverman and J. Ramsay, Functional Data Analysis, 2005.

L. Sirovich and M. Kirby, Low-dimensional procedure for the characterization of human faces, Journal of the Optical Society of America A, vol.4, issue.3, pp.519-524, 1987.
DOI : 10.1364/JOSAA.4.000519

S. A. Sisson, Transdimensional Markov Chains, Journal of the American Statistical Association, vol.100, issue.471, pp.1077-1089, 2005.
DOI : 10.1198/016214505000000664

T. Skeggs, Special report, visitor figures 2013. The Art Newspaper, 2014.

J. Skilling, Nested sampling for general Bayesian computation, Bayesian Analysis, vol.1, issue.4, pp.833-859, 2006.
DOI : 10.1214/06-BA127

M. Sobczyk, J. Bogdan, and . Josse, Bayesian Dimensionality Reduction With PCA Using Penalized Semi-Integrated Likelihood, Journal of Computational and Graphical Statistics, vol.144, 2017.
DOI : 10.1007/s00453-014-9891-7
URL : https://hal.archives-ouvertes.fr/hal-01342815

A. Spanos, Who Should Be Afraid of the Jeffreys-Lindley Paradox?, Philosophy of Science, vol.80, issue.1, pp.73-93, 2013.
DOI : 10.1086/668875

D. J. Spiegelhalter, N. G. Best, B. P. Carlin, and A. Van-der-linde, Bayesian measures of model complexity and fit, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.93, issue.4, pp.583-639, 2002.
DOI : 10.1002/1097-0258(20000915/30)19:17/18<2265::AID-SIM568>3.0.CO;2-6

D. J. Spiegelhalter, N. G. Best, B. P. Carlin, and A. Van-der-linde, The deviance information criterion: 12 years on (with discussion), Journal of the Royal Statistical Society: Series B (Statistical Methodology), issue.3, pp.76485-493, 2014.

J. Stoehr, A review on statistical inference methods for discrete Markov random fields. arXiv preprint, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01462078

S. Tavaré, D. J. Balding, R. C. Griffiths, and P. Donnelly, Inferring coalescence times from DNA sequence data, Genetics, vol.145, issue.2, pp.505-518, 1997.

A. E. Teschendorff, M. Journée, P. A. Absil, R. Sepulchre, and C. Caldas, Elucidating the Altered Transcriptional Programs in Breast Cancer using Independent Component Analysis, PLoS Computational Biology, vol.57, issue.8, p.161, 2007.
DOI : 10.1371/journal.pcbi.0030161.st002

C. M. Theobald, An inequality with application to multivariate analysis, Biometrika, vol.62, issue.2, pp.461-466, 1975.
DOI : 10.1093/biomet/62.2.461

R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B. (Statistical Methodology), vol.58, issue.1, pp.267-288, 1996.

M. E. Tipping, Sparse Bayesian learning and the relevance vector machine, The Journal of Machine Learning Research, vol.1, pp.211-244, 2001.

M. E. Tipping and C. M. Bishop, Probabilistic Principal Component Analysis, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.61, issue.3, pp.61611-622, 1999.
DOI : 10.1111/1467-9868.00196

D. Tran, A. Kucukelbir, A. B. Dieng, M. Rudolph, D. Liang et al., Edward: A library for probabilistic modeling, inference, and criticism. arXiv preprint, 2016.

D. Tran, R. Ranganath, and D. M. Blei, Deep and hierarchical implicit models. arXiv preprint, 2017.

M. Tran, D. J. Nott, and R. Kohn, Variational Bayes With Intractable Likelihood, Journal of Computational and Graphical Statistics, vol.145, 2017.
DOI : 10.1214/13-STS418

M. O. Ulfarsson and V. Solo, Sparse Variable PCA Using Geodesic Steepest Descent, IEEE Transactions on Signal Processing, vol.56, issue.12, pp.5823-5832, 2008.
DOI : 10.1109/TSP.2008.2006587

M. O. Ulfarsson and V. Solo, Dimension Estimation in Noisy PCA With SURE and Random Matrix Theory, IEEE Transactions on Signal Processing, vol.56, issue.12, pp.5804-5816, 2008.
DOI : 10.1109/TSP.2008.2005865

M. O. Ulfarsson and V. Solo, Vector $l_0$ Sparse Variable PCA, IEEE Transactions on Signal Processing, vol.59, issue.5, pp.1949-1958, 2011.
DOI : 10.1109/TSP.2011.2112653

A. W. Van and . Vaart, Asymptotic statistics, 2000.

A. Vehtari and J. Ojanen, A survey of Bayesian predictive methods for model assessment, selection and comparison, Statistics Surveys, vol.6, issue.0, pp.142-228, 2012.
DOI : 10.1214/12-SS102

C. Villa and S. Walker, On the mathematics of the Jeffreys???Lindley paradox, Communications in Statistics - Theory and Methods, vol.3, issue.24, p.2017
DOI : 10.1111/sjos.12145

V. Q. Vu and J. Lei, Minimax sparse principal subspace estimation in high dimensions. The Annals of Statistics, pp.2905-2947, 2013.

S. G. Walker, Modern Bayesian Asymptotics, Statistical Science, vol.19, issue.1, pp.111-117, 2004.
DOI : 10.1214/088342304000000134
URL : http://doi.org/10.1214/088342304000000134

Y. Wang, J. G. Klijn, Y. Zhang, A. M. Sieuwerts, M. P. Look et al., Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. The Lancet, pp.365671-679, 2005.

. Watanabe, Algebraic Analysis for Singular Statistical Estimation, International Conference on Algorithmic Learning Theory, pp.39-50, 1999.
DOI : 10.1007/3-540-46769-6_4

S. Watanabe, Algebraic geometry and statistical learning theory, 2009.
DOI : 10.1017/CBO9780511800474

S. Watanabe, A widely applicable Bayesian information criterion, Journal of Machine Learning Research, vol.14, pp.867-897, 2013.

S. Watanabe, Y. Minami, A. Nakamura, and N. Ueda, Application of variational Bayesian approach to speech recognition, Advances in Neural Information Processing Systems, pp.1261-1268, 2003.

D. L. Weakliem, A Critique of the Bayesian Information Criterion for Model Selection, Sociological Methods & Research, vol.22, issue.3, pp.359-397, 1999.
DOI : 10.1177/0049124194022004002

M. D. Weinberg, Computing the Bayes Factor from a Markov Chain Monte Carlo Simulation of the Posterior Distribution, Bayesian Analysis, vol.7, issue.3, pp.737-770, 2012.
DOI : 10.1214/12-BA725

S. Weisberg, Applied Linear Regression, 1980.
DOI : 10.1002/0471704091

D. Wipf and S. Nagarajan, A new view of automatic relevance determination, Advances in neural information processing systems, pp.1625-1632, 2008.

D. Wipf and S. Nagarajan, A unified Bayesian framework for MEG/EEG source imaging, NeuroImage, vol.44, issue.3, pp.947-966, 2009.
DOI : 10.1016/j.neuroimage.2008.02.059

D. P. Wipf, B. D. Rao, and S. Nagarajan, Latent variable Bayesian models for promoting sparsity. Information Theory, IEEE Transactions on, vol.57, issue.9, pp.6236-6255, 2011.
DOI : 10.1109/tit.2011.2162174
URL : http://dsp.ucsd.edu/~dwipf/wipf_draft2009.pdf

J. Wishart and M. S. Bartlett, The distribution of second order moment statistics in a normal system, Mathematical Proceedings of the Cambridge Philosophical Society, vol.10, issue.04, pp.455-459, 1932.
DOI : 10.1093/qmath/os-2.1.130

D. M. Witten and R. Tibshirani, A Framework for Feature Selection in Clustering, Journal of the American Statistical Association, vol.105, issue.490, pp.713-726, 2010.
DOI : 10.1198/jasa.2010.tm09415

S. Wold, Cross-Validatory Estimation of the Number of Components in Factor and Principal Components Models, Technometrics, vol.35, issue.4, pp.397-405, 1978.
DOI : 10.1016/S0021-9673(01)85348-6

D. Wrinch and H. Jeffreys, On certain fundamental principles of scientific inquiry (second paper) The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, issue.266, pp.45368-374, 1923.

C. Wu, On the convergence properties of the EM algorithm. The Annals of statistics, pp.95-103, 1983.

S. Xiaoshuang, L. Zhihui, G. Zhenhua, W. Minghua, Z. Cairong et al., Sparse Principal Component Analysis via Joint L 2,1-Norm Penalty, AI 2013: Advances in Artificial Intelligence, pp.148-159, 2013.
DOI : 10.1007/978-3-319-03680-9_16

J. Xie, R. Girshick, and A. Farhadi, Unsupervised deep embedding for clustering analysis, International Conference on Machine Learning, pp.478-487, 2016.

L. Xu and M. Jordan, On Convergence Properties of the EM Algorithm for Gaussian Mixtures, Neural Computation, vol.11, issue.1, pp.129-151, 1996.
DOI : 10.1162/neco.1994.6.2.334

Y. Yang, M. J. Wainwright, and M. I. Jordan, On the computational complexity of highdimensional Bayesian variable selection. The Annals of Statistics, pp.2497-2532, 2016.

J. Ye, On Measuring and Correcting the Effects of Data Mining and Model Selection, Journal of the American Statistical Association, vol.87, issue.441, pp.120-131, 1998.
DOI : 10.1214/aos/1176348375

T. Yen, A majorization-minimization approach to variable selection using spike and slab priors. The Annals of Statistics, pp.1748-1775, 2011.

L. Yengo, J. Jacques, and C. Biernacki, Variable clustering in high dimensional linear regression models, Journal de la Société Française de Statistique, pp.38-56, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00764927

L. Yengo, J. Jacques, C. Biernacki, and M. Canouil, Variable clustering in high-dimensional linear regression: The R package clere, The R Journal, vol.8, issue.1, pp.92-106, 2016.
URL : https://hal.archives-ouvertes.fr/hal-00940929

G. Yu and Q. He, ReactomePA: an R/Bioconductor package for reactome pathway analysis and visualization, Molecular BioSystems, vol.15, issue.2, 2016.
DOI : 10.1186/gb-2014-15-2-r23

L. Yu, R. R. Snapp, T. Ruiz, and M. Radermacher, Probabilistic principal component analysis with expectation maximization (PPCA-EM) facilitates volume classification and estimates the missing data, Journal of Structural Biology, vol.171, issue.1, pp.18-30, 2010.
DOI : 10.1016/j.jsb.2010.04.002

Y. Yu, On normal variance???mean mixtures, Statistics & Probability Letters, vol.121, pp.45-50, 2017.
DOI : 10.1016/j.spl.2016.07.024
URL : http://arxiv.org/pdf/1106.2333

A. Zellner, On assessing prior distributions and Bayesian regression analysis with g-prior distributions. Bayesian inference and decision techniques: Essays in Honor of Bruno De Finetti, pp.233-243, 1986.

A. Zellner, Keep it sophisticatedly simple, Simplicity, inference and modelling: Keeping it sophisticatedly simple, pp.242-262, 2001.
DOI : 10.1017/CBO9780511493164.014

C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, Understanding deep learning requires rethinking generalization, Proceedings of the fifth International Conference on Learning Representations, 2017.

T. Zhang, Information-theoretic upper and lower bounds for statistical estimation, IEEE Transactions on Information Theory, vol.52, issue.4, pp.1307-1321, 2006.
DOI : 10.1109/TIT.2005.864439

Y. Zhang and L. Ghaoui, Large-scale sparse principal component analysis with application to text data, Advances in Neural Information Processing Systems, pp.532-539, 2011.

Y. Zhang, A. Aspremont, and L. Ghaoui, Sparse PCA: Convex Relaxations, Algorithms and Applications, Handbook on Semidefinite, Conic and Polynomial Optimization, pp.915-940, 2012.
DOI : 10.1007/978-1-4614-0769-0_31

P. Zhao and B. Yu, On model selection consistency of lasso, The Journal of Machine Learning Research, vol.7, pp.2541-2563, 2006.

M. Zhu and A. Ghodsi, Automatic dimensionality selection from the scree plot via the use of profile likelihood, Computational Statistics & Data Analysis, vol.51, issue.2, pp.918-930, 2006.
DOI : 10.1016/j.csda.2005.09.010

H. Zou, The Adaptive Lasso and Its Oracle Properties, Journal of the American Statistical Association, vol.101, issue.476, pp.1418-1429, 2006.
DOI : 10.1198/016214506000000735

H. Zou and T. Hastie, Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.5, issue.2, pp.301-320, 2005.
DOI : 10.1073/pnas.201162998

H. Zou, T. Hastie, and R. Tibshirani, Sparse Principal Component Analysis, Journal of Computational and Graphical Statistics, vol.15, issue.2, pp.265-286, 2006.
DOI : 10.1198/106186006X113430