Introducing complex dependency structures into supervised component-based models

Abstract : High redundancy of explanatory variables results in identification troubles and a severe lack of stability of regression model estimates. Even when estimation is possible, a consequence is the near-impossibility to interpret the results. It is then necessary to combine its likelihood with an extra-criterion regularising the estimates. In the wake of PLS regression, the regularising strategy considered in this thesis is based on extracting supervised components. Such orthogonal components must not only capture the structural information of the explanatory variables, but also predict as well as possible the response variables, which can be of various types (continuous or discrete, quantitative, ordinal or nominal). Regression on supervised components was developed for multivariate GLMs, but so far concerned models with independent observations. However, in many situations, the observations are grouped. We propose an extension of the method to multivariate GLMMs, in which within-group correlations are modelled with random effects. At each step of Schall's algorithm for GLMM estimation, we regularise the model by extracting components that maximise a trade-off between goodness-of-fit and structural relevance. Compared to penalty-based regularisation methods such as ridge or LASSO, we show on simulated data that our method not only reveals the important explanatory dimensions for all responses, but often gives a better prediction too. The method is also assessed on real data. We finally develop regularisation methods in the specific context of panel data (involving repeated measures on several individuals at the same time-points). Two random effects are introduced: the first one models the dependence of measures related to the same individual, while the second one models a time-specific effect (thus having a certain inertia) shared by all the individuals. For Gaussian responses, we first propose an EM algorithm to maximise the likelihood penalised by the L2-norm of the regression coefficients. Then, we propose an alternative which rather gives a bonus to the "strongest" directions in the explanatory subspace. An extension of these approaches is also proposed for non-Gaussian data, and comparative tests are carried out on Poisson data.
Complete list of metadatas

Cited literature [138 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/tel-02265667
Contributor : Jocelyn Chauvet <>
Submitted on : Sunday, August 11, 2019 - 2:57:26 PM
Last modification on : Wednesday, August 14, 2019 - 4:44:54 PM

File

manuscritTHESE_CHAUVET_diffusi...
Files produced by the author(s)

Identifiers

  • HAL Id : tel-02265667, version 1

Citation

Jocelyn Chauvet. Introducing complex dependency structures into supervised component-based models. Statistics [stat]. Université de Montpellier, 2019. English. ⟨tel-02265667⟩

Share

Metrics

Record views

176

Files downloads

15