Skip to Main content Skip to Navigation
Conference papers

Correspondence Analysis and Classification

Gilbert Saporta 1
1 CEDRIC - MSDMA - CEDRIC. Méthodes statistiques de data-mining et apprentissage
CEDRIC - Centre d'études et de recherche en informatique et communications
Abstract : The use of correspondence analysis for discrimination purposes goes back to the “prehistory” of data analysis (Fisher, 1940) where one looks for the optimal scaling of categories of a variable X in order to predict a categorical variable Y. When there are several categorical predictors a commonly used technique consists in a two step analysis: multiple correspondence is first performed on the predictors set, followed by a discriminant analysis using factor coordinates of units as numerical predictors (Bouroche and al.,1977). However in banking applications (credit scoring) logistic regression seems to be more and more used instead of discriminant analysis when predictors are categorical. One of the reasons advocated in favour of logistic regression, is that it gives a probabilistic model and it is often claimed among econometricians that the theoretical basis is more solid, but this is arguable. No doubt also that this tendency is due to the the flexibility of logistic regression software which have been more developped compared to discriminant analysis. However it could be easily proved that discarding non informative eigenvectors gives more robust results than direct logistic regression, since it is a regularisation technique similar to Principal Component Regression (Hastie and al. 2001). Moreover correspondence analysis provides an insight on the data, which is always useful. Since factor coordinates are derived without taking into account the response variable, one could think of adapting PLS regression. We will show that PLS is related, at least for the first PLS component, to barycentric discrimination (Celeux, Nakache 1994 and Verde, Palumbo 1996). For two class discrimination, we will also present a combination of logistic regression and correspondence analysis, as well as ridge regression which are interesting alternatives. A comparison of all these methods will be illustrated on a real case study.
Document type :
Conference papers
Complete list of metadata
Contributor : Laboratoire Cedric <>
Submitted on : Friday, December 11, 2020 - 12:22:36 PM
Last modification on : Tuesday, December 15, 2020 - 11:16:15 AM


  • HAL Id : hal-01124807, version 1



Gilbert Saporta. Correspondence Analysis and Classification. CARME2003: Correspondence Analysis and Related Methods, Jun 2003, Barcelone, Spain. ⟨hal-01124807⟩



Record views


Files downloads