M. Abramowitz and I. Stegun, Handbook of Mathematical Functions, American Journal of Physics, vol.34, issue.2, 1965.
DOI : 10.1119/1.1972842

M. Aminghafari, N. Cheze, and J. Poggi, Multivariate denoising using wavelets and principal component analysis, Computational Statistics & Data Analysis, vol.50, issue.9, pp.2381-2398, 2006.
DOI : 10.1016/j.csda.2004.12.010

C. Archambeau and F. Bach, Sparse probabilistic projections, Advances in neural information processing systems, pp.73-80, 2009.

Y. Benjamini and Y. Hochberg, Controlling the false discovery rate: a practical and powerful approach to multiple testing, Journal of the Royal Statistical Society: Series B (Statistical Methodology), pp.289-300, 1995.

C. Biernacki, G. Celeux, and G. Govaert, Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models, Computational Statistics & Data Analysis, vol.41, issue.3-4, pp.561-575, 2003.
DOI : 10.1016/S0167-9473(02)00163-9

C. M. Bishop and P. Bayesian, Advances in neural information processing systems, pp.382-388, 1999.

C. M. Bishop, Variational principal components, 9th International Conference on Artificial Neural Networks: ICANN '99, pp.509-514, 1999.
DOI : 10.1049/cp:19991160

C. M. Bishop, Pattern recognition and machine learning, 2006.

C. Bouveyron, G. Celeux, and S. Girard, Intrinsic dimension estimation by maximum likelihood in isotropic probabilistic PCA, Pattern Recognition Letters, vol.32, issue.14, pp.1706-1713, 2011.
DOI : 10.1016/j.patrec.2011.07.017
URL : https://hal.archives-ouvertes.fr/hal-00440372

M. J. Brusco, A comparison of simulated annealing algorithms for variable selection in principal component analysis and discriminant analysis, Computational Statistics & Data Analysis, vol.77, pp.38-53, 2014.
DOI : 10.1016/j.csda.2014.03.001

G. Celeux, M. Anbari, J. Marin, and C. P. Robert, Regularization in Regression: Comparing Bayesian and Frequentist Methods in a Poorly Informative Situation, Bayesian Analysis, vol.7, issue.2, pp.477-502, 2012.
DOI : 10.1214/12-BA716
URL : https://hal.archives-ouvertes.fr/hal-00943727

T. Chan, K. Jia, S. Gao, J. Lu, Z. Zeng et al., PCANet: A simple deep learning baseline for image classification? Image Processing, IEEE Transactions on, vol.24, issue.12, pp.5017-5032, 2015.

A. Aspremont, F. Bach, and L. Ghaoui, Optimal solutions for sparse principal component analysis, The Journal of Machine Learning Research, vol.9, pp.1269-1294, 2008.

K. Fang, S. Kotz, and K. W. Ng, Symmetric multivariate and related distributions, 1990.
DOI : 10.1007/978-1-4899-2937-2

A. Gramfort, D. Strohmeier, J. Haueisen, M. S. Hämäläinen, and M. Kowalski, Time-frequency mixed-norm estimates: Sparse M/EEG imaging with non-stationary source activations, NeuroImage, vol.70, pp.410-422, 2013.
DOI : 10.1016/j.neuroimage.2012.12.051
URL : https://hal.archives-ouvertes.fr/hal-00773276

Q. Gu, Z. Li, and J. Han, Joint feature selection and subspace learning, Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), p.1294, 2011.

Y. Guan and J. G. Dy, Sparse probabilistic principal component analysis, International Conference on Artificial Intelligence and Statistics, pp.185-192, 2009.

P. Hartman and G. S. Watson, Normal" distribution functions on spheres and the modified Bessel functions. The Annals of Probability, pp.593-607, 1974.

T. Hastie, R. Tibshirani, and M. Wainwright, Statistical Learning with Sparsity: The Lasso and Generalizations, 2015.

H. Hotelling, Analysis of a complex of statistical variables into principal components., Journal of Educational Psychology, vol.24, issue.6, p.417, 1933.
DOI : 10.1037/h0071325

A. Ilin and T. Raiko, Practical approaches to principal component analysis in the presence of missing values, The Journal of Machine Learning Research, vol.11, 1957.

R. Jenatton, G. Obozinski, and F. Bach, Structured sparse principal component analysis, International Conference on Artificial Intelligence and Statistics, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00414158

I. M. Johnstone and A. Y. Lu, On Consistency and Sparsity for Principal Components Analysis in High Dimensions, Journal of the American Statistical Association, vol.104, issue.486, 2009.
DOI : 10.1198/jasa.2009.0121

I. T. Jolliffe, Discarding Variables in a Principal Component Analysis. I: Artificial Data, Applied statistics, pp.160-173, 1972.
DOI : 10.2307/2346488

I. T. Jolliffe, Discarding Variables in a Principal Component Analysis. II: Real Data, Applied Statistics, vol.22, issue.1, pp.21-31, 1973.
DOI : 10.2307/2346300

M. Journée, Geometric algorithms for component analysis with a view to gene expression data analysis, 2009.

M. Journée, Y. Nesterov, P. Richtárik, and R. Sepulchre, Generalized power method for sparse principal component analysis, The Journal of Machine Learning Research, vol.11, pp.517-553, 2010.

Z. Khan, F. Shafait, and A. Mian, Joint Group Sparse PCA for Compressed Hyperspectral Imaging, IEEE Transactions on Image Processing, vol.24, issue.12, pp.4934-4942, 2015.
DOI : 10.1109/TIP.2015.2472280

R. Khanna, J. Ghosh, R. Poldrack, and O. Koyejo, Sparse submodular probabilistic PCA, Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, pp.453-461, 2015.

H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio, An empirical evaluation of deep architectures on problems with many factors of variation, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.473-480, 2007.
DOI : 10.1145/1273496.1273556

P. Latouche, P. Mattei, C. Bouveyron, and J. Chiquet, Combining a relaxed EM algorithm with Occam???s razor for Bayesian variable selection in high-dimensional regression, Journal of Multivariate Analysis, vol.146, pp.177-190, 2016.
DOI : 10.1016/j.jmva.2015.09.004

D. N. Lawley, A modified method of estimation in factor analysis and some large sample results, Proceedings of the Uppsala Symposium on Psychological Factor Analysis, pp.35-42, 1953.

M. Lázaro-gredilla and M. K. Titsias, Spike and slab variational inference for multi-task and multiple kernel learning, Advances in neural information processing systems, pp.2339-2347, 2011.

T. Liu, L. Trinchera, A. Tenenhaus, D. Wei, and A. O. Hero, Globally Sparse PLS Regression, New Perspectives in Partial Least Squares and Related Methods, pp.117-127, 2013.
DOI : 10.1007/978-1-4614-8283-3_7
URL : https://hal.archives-ouvertes.fr/hal-00750939

L. Lorch, Inequalities for some Whittaker functions, Archivum Mathematicum, vol.3, issue.1, pp.1-9, 1967.

D. J. Mackay, Bayesian Methods for Backpropagation Networks, Models of neural networks III, pp.211-254, 1994.
DOI : 10.1007/978-1-4612-0723-8_6

D. J. Mackay, Information theory, inference, and learning algorithms, 2003.

D. Madigan and A. E. Raftery, Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occam's Window, Journal of the American Statistical Association, vol.52, issue.428, pp.1535-1546, 1994.
DOI : 10.1002/net.3230200507

M. Mäechler, Bessel: Bessel ? Bessel Functions Computations and Approximations, 2013. URL https://CRAN.R-project.org/package=Bessel. R package version 0, pp.5-5

M. Masaeli, Y. Yan, Y. Cui, G. Fung, and J. G. Dy, Convex Principal Feature Selection, SIAM International Conference on Data Mining, pp.619-628, 2010.
DOI : 10.1137/1.9781611972801.54

P. Mattei, C. Bouveyron, and P. Latouche, Globally sparse probabilistic PCA, Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, pp.976-984, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01310409

J. A. Miller, C. Cai, P. Langfelder, D. H. Geschwind, S. M. Kurian et al., Strategies for aggregating gene expression data: The collapseRows R function, BMC Bioinformatics, vol.12, issue.1, p.1, 2011.
DOI : 10.1093/bioinformatics/btl163

T. P. Minka, Automatic choice of dimensionality for PCA, NIPS, pp.598-604, 2000.

A. J. Minn, G. P. Gupta, D. Padua, P. Bos, D. X. Nguyen et al., Lung metastasis genes couple breast tumor size and metastatic spread, Proceedings of the National Academy of Sciences, pp.6740-6745, 2007.
DOI : 10.1073/pnas.0701138104

T. Mitchell and J. Beauchamp, Bayesian Variable Selection in Linear Regression, Journal of the American Statistical Association, vol.51, issue.404, pp.1023-1036, 1988.
DOI : 10.1080/01621459.1982.10477809

B. Moghaddam, Y. Weiss, and S. Avidan, Spectral bounds for sparse PCA: Exact and greedy algorithms, Advances in neural information processing systems, pp.915-922, 2005.

S. Mohamed, K. Heller, and Z. Ghahramani, Bayesian and l1 approaches for sparse unsupervised learning, Proceedings of the 29th International Conference on Machine Learning (ICML-12), ICML '12, pp.751-758, 2012.

S. Nakajima, M. Sugiyama, and D. Babacan, On Bayesian PCA: Automatic dimensionality selection and analytic solution, Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp.497-504, 2011.

S. Nakajima, R. Tomioka, M. Sugiyama, and S. D. Babacan, Condition for perfect dimensionality recovery by variational bayesian PCA, Journal of Machine Learning Research, vol.16, pp.3757-3811, 2015.

R. M. Neal, Bayesian Learning for Neural Networks, 1996.
DOI : 10.1007/978-1-4612-0745-0

H. Ogata, A numerical integration formula based on the Bessel functions, Publications of the Research Institute for Mathematical Sciences, vol.41, issue.4, pp.949-970, 2005.
DOI : 10.2977/prims/1145474602

D. Passemier, Z. Li, and J. Yao, On estimation of the noise variance in high dimensional probabilistic principal component analysis, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.48, issue.1
DOI : 10.1111/rssb.12153
URL : https://hal.archives-ouvertes.fr/hal-00851783

Y. Qiu and J. Mei, RSpectra: Solvers for Large Scale Eigenvalue and SVD Problems, 2016. URL https://CRAN.R-project.org/package=RSpectra. R package version 0, pp.12-12

M. Ringnér, What is principal component analysis? Nature biotechnology, pp.303-304, 2008.

I. Rivals, L. Personnaz, L. Taing, and M. Potier, Enrichment or depletion of a GO category within a class of genes: which test?, Bioinformatics, vol.23, issue.4, pp.401-407, 2007.
DOI : 10.1093/bioinformatics/btl633
URL : https://hal.archives-ouvertes.fr/hal-00801557

P. Robert and Y. Escoufier, A Unifying Tool for Linear Multivariate Statistical Methods: The RV- Coefficient, Applied Statistics, vol.25, issue.3, pp.257-265, 1976.
DOI : 10.2307/2347233

S. Roweis and S. , Advances in neural information processing systems, pp.626-632, 1998.

R. Schaback and Z. Wu, Operators on radial functions, Journal of Computational and Applied Mathematics, vol.73, issue.1-2, pp.257-270, 1996.
DOI : 10.1016/0377-0427(96)00047-7

M. Schroeder, B. Haibe-kains, A. Culhane, C. Sotiriou, G. Bontempi et al., breastCancerVDX: Gene expression datasets published by, 2005.

G. Schwarz, Estimating the Dimension of a Model, The Annals of Statistics, vol.6, issue.2, pp.461-464, 1978.
DOI : 10.1214/aos/1176344136

H. Shen and J. Z. Huang, Sparse principal component analysis via regularized low rank matrix approximation, Journal of Multivariate Analysis, vol.99, issue.6, pp.1015-1034, 2008.
DOI : 10.1016/j.jmva.2007.06.007

C. D. Sigg and J. M. Buhmann, Expectation-maximization for sparse and non-negative PCA, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.960-967, 2008.
DOI : 10.1145/1390156.1390277

A. E. Teschendorff, M. Journée, P. A. Absil, R. Sepulchre, and C. Caldas, Elucidating the Altered Transcriptional Programs in Breast Cancer using Independent Component Analysis, PLoS Computational Biology, vol.57, issue.8
DOI : 10.1371/journal.pcbi.0030161.st002

C. M. Theobald, An inequality with application to multivariate analysis, Biometrika, vol.62, issue.2, pp.461-466, 1975.
DOI : 10.1093/biomet/62.2.461

M. Tipping, Sparse bayesian learning and the relevance vector machine, The Journal of Machine Learning Research, vol.1, pp.211-244, 2001.

M. E. Tipping and C. M. Bishop, Probabilistic Principal Component Analysis, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.61, issue.3, pp.61611-622, 1999.
DOI : 10.1111/1467-9868.00196

M. O. Ulfarsson and V. Solo, Sparse Variable PCA Using Geodesic Steepest Descent, IEEE Transactions on Signal Processing, vol.56, issue.12, pp.5823-5832, 2008.
DOI : 10.1109/TSP.2008.2006587

M. O. Ulfarsson and V. Solo, Vector <formula formulatype="inline"><tex Notation="TeX">$l_0$</tex> </formula> Sparse Variable PCA, IEEE Transactions on Signal Processing, vol.59, issue.5, pp.1949-1958, 2011.
DOI : 10.1109/TSP.2011.2112653

V. Q. Vu and J. Lei, Minimax sparse principal subspace estimation in high dimensions. The Annals of Statistics, pp.2905-2947, 2013.

Y. Wang, J. G. Klijn, Y. Zhang, A. M. Sieuwerts, M. P. Look et al., Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. The Lancet, pp.365671-679, 2005.

D. Wipf and S. Nagarajan, A new view of automatic relevance determination, Advances in neural information processing systems, pp.1625-1632, 2008.

D. Wipf and S. Nagarajan, A unified Bayesian framework for MEG/EEG source imaging, NeuroImage, vol.44, issue.3, pp.947-966, 2009.
DOI : 10.1016/j.neuroimage.2008.02.059

D. P. Wipf, B. D. Rao, and S. Nagarajan, Latent variable bayesian models for promoting sparsity. Information Theory, IEEE Transactions on, vol.57, issue.9, pp.6236-6255, 2011.

J. Wishart and M. S. Bartlett, The distribution of second order moment statistics in a normal system, Mathematical Proceedings of the Cambridge Philosophical Society, vol.10, issue.04, pp.10-1932
DOI : 10.1093/qmath/os-2.1.130

S. Xiaoshuang, L. Zhihui, G. Zhenhua, W. Minghua, Z. Cairong et al., Sparse Principal Component Analysis via Joint L 2,1-Norm Penalty, AI 2013: Advances in Artificial Intelligence, pp.148-159, 2013.
DOI : 10.1007/978-3-319-03680-9_16

L. Xu and M. Jordan, On Convergence Properties of the EM Algorithm for Gaussian Mixtures, Neural Computation, vol.11, issue.1, pp.129-151, 1996.
DOI : 10.1162/neco.1994.6.2.334

G. Yu and Q. He, ReactomePA: an R/Bioconductor package for reactome pathway analysis and visualization, Mol. BioSyst., vol.15, issue.2, 2016.
DOI : 10.1039/C5MB00663E

L. Yu, R. R. Snapp, T. Ruiz, and M. Radermacher, Probabilistic principal component analysis with expectation maximization (PPCA-EM) facilitates volume classification and estimates the missing data, Journal of Structural Biology, vol.171, issue.1, pp.18-30, 2010.
DOI : 10.1016/j.jsb.2010.04.002

Y. Zhang and L. Ghaoui, Large-scale sparse principal component analysis with application to text data, Advances in Neural Information Processing Systems, pp.532-539, 2011.

Y. Zhang, A. Aspremont, and L. Ghaoui, Sparse PCA: Convex Relaxations, Algorithms and Applications, Handbook on Semidefinite, Conic and Polynomial Optimization, pp.915-940, 2012.
DOI : 10.1007/978-1-4614-0769-0_31

H. Zou, T. Hastie, and R. Tibshirani, Sparse Principal Component Analysis, Journal of Computational and Graphical Statistics, vol.15, issue.2, pp.265-286, 2006.
DOI : 10.1198/106186006X113430