Binarsity: a penalization for one-hot encoded features - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Journal of Machine Learning Research Année : 2019

Binarsity: a penalization for one-hot encoded features

Résumé

This paper deals with the problem of large-scale linear supervised learning in settings where a large number of continuous features are available. We propose to combine the well-known trick of one-hot encoding of continuous features with a new penalization called binarsity. In each group of binary features coming from the one-hot encoding of a single raw continuous feature, this penalization uses total-variation regularization together with an extra linear constraint to avoid collinearity within groups. Non-asymptotic oracle inequalities for generalized linear models are proposed, and numerical experiments illustrate the good performances of our approach on several datasets. It is also noteworthy that our method has a numerical complexity comparable to standard L1 penalization.
Fichier principal
Vignette du fichier
alaya17a.pdf (4.2 Mo) Télécharger le fichier
code.jpg (173.35 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-01648382 , version 1 (25-11-2017)

Identifiants

  • HAL Id : hal-01648382 , version 1

Citer

Mokhtar Z. Alaya, Simon Bussy, Stéphane Gaïffas, Agathe Guilloux. Binarsity: a penalization for one-hot encoded features. Journal of Machine Learning Research, 2019, 20 (118), pp.1−34. ⟨hal-01648382⟩
592 Consultations
92 Téléchargements

Partager

Gmail Facebook X LinkedIn More