Binarsity: a penalization for one-hot encoded features

Abstract : This paper deals with the problem of large-scale linear supervised learning in settings where a large number of continuous features are available. We propose to combine the well-known trick of one-hot encoding of continuous features with a new penalization called binarsity. In each group of binary features coming from the one-hot encoding of a single raw continuous feature, this penalization uses total-variation regularization together with an extra linear constraint to avoid collinearity within groups. Non-asymptotic oracle inequalities for generalized linear models are proposed, and numerical experiments illustrate the good performances of our approach on several datasets. It is also noteworthy that our method has a numerical complexity comparable to standard L1 penalization.
Type de document :
Pré-publication, Document de travail
2017
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01648382
Contributeur : Simon Bussy <>
Soumis le : samedi 25 novembre 2017 - 17:14:25
Dernière modification le : mercredi 23 janvier 2019 - 10:29:27

Identifiants

  • HAL Id : hal-01648382, version 1

Citation

Mokhtar Z. Alaya, Simon Bussy, Stéphane Gaïffas, Agathe Guilloux. Binarsity: a penalization for one-hot encoded features. 2017. 〈hal-01648382〉

Partager

Métriques

Consultations de la notice

264

Téléchargements de fichiers

56