https://hal.archives-ouvertes.fr/hal-02966143Mignolet, MarcMarcMignoletASU - Arizona State University [Tempe]Soize, ChristianChristianSoizeMSME - Laboratoire Modélisation et Simulation Multi-Echelle - UPEC UP12 - Université Paris-Est Créteil Val-de-Marne - Paris 12 - CNRS - Centre National de la Recherche Scientifique - Université Gustave EiffelCompressed principal component analysis of non-Gaussian vectorsHAL CCSD2020Principal component analysisCompressed principal component analysisNon- Gaussian vectorRandom eigenvectorsSymmetric polynomialsRandom fieldsStochastic processesInverse problemStochastic modelReduction methodUncertainty quantificationStochastic modeling[MATH.MATH-ST] Mathematics [math]/Statistics [math.ST][MATH.MATH-PR] Mathematics [math]/Probability [math.PR][STAT.ML] Statistics [stat]/Machine Learning [stat.ML]Soize, Christian2020-10-13 18:57:532022-09-29 14:21:152020-10-15 14:57:51enJournal articleshttps://hal.archives-ouvertes.fr/hal-02966143/document10.1137/20M1322029application/pdf1A novel approximate representation of non-Gaussian random vectors is introduced and validated, which can be viewed as a Compressed Principal Component Analysis (CPCA). This representation relies on the eigenvectors of the covariance matrix obtained as in a Principal Component Analysis (PCA) but expresses the random vector as a linear combination of a random sample of N of these eigenvectors. In this model, the indices of these eigenvectors are independent discrete random variables with probabilities proportional to the corresponding eigenvalues. Moreover, the coefficients of the linear combination are zero mean unit variance random variables. Under these conditions, it is first shown that the covariance matrix of this CPCA matches exactly its PCA counterpart independently of the value of N. Next, it is also shown that the distribution of the random coefficients can be selected, without loss of generality, to be a symmetric function. Then, to represent the vector of these coefficients, a novel set of symmetric vector-valued multidimensional polynomials of the canonical Gaussian random vector is derived. Interestingly, it is noted that the number of such polynomials is only slowly growing with the maximum polynomial order thereby providing a framework for a compact approximation of the target random vector. The identification of the de-terministic parameters of the expansion of the random coefficients on these symmetric vector-valued multidimensional polynomial is addressed next. Finally, an example of application is provided that demonstrates the good matching of the distributions of the elements of the target random vector and its approximation with only a very limited number of parameters. 1. Introduction. The objective of this paper is to propose the Compressed Principal Component Analysis (CPCA) that is a novel small parameterized representation of any non-Gaussian second-order random variable X = (X 1 ,. .. , X n) with values in R n. This representation would be useful for solving statistical inverse problems related to any stochastic computational model for which there is an uncertain vector-valued system-parameter that is modeled by a random vector X. To explain the benefits of this representation, consider the framework of a classical statistical inverse problem. Let us assume that a parameterized representation of X has been constructed and is written as X = g(z, Ξ) in which Ξ = (Ξ 1 ,. .. , Ξ N) is the R N-valued normalized Gaussian random variable (centered and with a co-variance matrix that is the identity matrix) the probability distribution of which is denoted by P Ξ (dξ) on R N. The parameterization of the representation corresponds to the vector z = (z 1 ,. .. , z M) of hyperparameters, which belongs to an admissible set that is a subset C z of R M. The measurable mapping ξ → g(z, ξ) is defined through the construction of the representation. Consequently, if z is fixed to a given value z opt , then the probability distribution P X of X is completely defined as the image of P Ξ (dξ) under the mapping ξ → g(z opt , ξ). Let us consider a computational model with an uncertain system-parameter x that is modeled by random variable X. Let Q be the vector-valued random quantity of interest that is constructed as an obser