Feature quantization for parsimonious and interpretable predictive models

Adrien Ehrhardt; Christophe Biernacki; Vincent Vandewalle; Philippe Heinrich

Pré-Publication, Document De Travail Année : 2019

Feature quantization for parsimonious and interpretable predictive models

(1) , (1) , (1) , (2)

1
2

Adrien Ehrhardt

Fonction : Auteur
PersonId : 171162
IdHAL : adrien-ehrhardt
ORCID : 0000-0002-4448-3644

MOdel for Data Analysis and Learning

Christophe Biernacki

Fonction : Auteur
PersonId : 923939

MOdel for Data Analysis and Learning

Vincent Vandewalle

Fonction : Auteur
PersonId : 6383
IdHAL : vincent-vandewalle
ORCID : 0000-0003-2946-9059
IdRef : 14348091X

MOdel for Data Analysis and Learning

Philippe Heinrich

Fonction : Auteur
PersonId : 872598

Laboratoire Paul Painlevé - UMR 8524

Résumé

For regulatory and interpretability reasons, the logistic regression is still widely used by financial institutions to learn the refunding probability of a loan from applicant's historical data. To improve prediction accuracy and interpretability, a preprocessing step quantizing both continuous and categorical data is usually performed: continuous features are discretized by assigning factor levels to intervals and, if numerous, levels of categorical features are grouped. However, a better predictive accuracy can be reached by embedding this quantization estimation step directly into the predictive estimation step itself. By doing so, the predictive loss has to be optimized on a huge and untractable discontinuous quantization set. To overcome this difficulty, we introduce a specific two-step optimization strategy: first, the optimization problem is relaxed by approximating discontinuous quan-tization functions by smooth functions; second, the resulting relaxed optimization problem is solved via a particular neural network and stochas-tic gradient descent. The strategy gives then access to good candidates for the original optimization problem after a straightforward maximum a posteriori procedure to obtain cutpoints. The good performances of this approach, which we call glmdisc, are illustrated on simulated and real data from the UCI library and Crédit Agricole Consumer Finance (a major Eu-ropean historic player in the consumer credit market). The results show that practitioners finally have an automatic all-in-one tool that answers their recurring needs of quantization for predictive tasks.

Domaines

Méthodologie [stat.ME]

Fichier principal

feature_quantization.pdf (344.65 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Adrien Ehrhardt : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01949135

Soumis le : jeudi 21 mars 2019-11:32:58

Dernière modification le : samedi 27 avril 2024-03:10:44

Archivage à long terme le : samedi 22 juin 2019-13:36:08

Dates et versions

hal-01949135 , version 1 (21-12-2018)

hal-01949135 , version 2 (21-03-2019)

Identifiants

HAL Id : hal-01949135 , version 2

Citer

Adrien Ehrhardt, Christophe Biernacki, Vincent Vandewalle, Philippe Heinrich. Feature quantization for parsimonious and interpretable predictive models. 2019. ⟨hal-01949135v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA INSMI INRIA2 UNIV-LILLE LPP-MATH

72 Consultations

316 Téléchargements

Feature quantization for parsimonious and interpretable predictive models

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager