Learning from aggregated data with a maximum entropy model - Archive ouverte HAL Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2022

Learning from aggregated data with a maximum entropy model

Alexandre Gilotte
  • Fonction : Auteur
Ahmed Ben Yamed
  • Fonction : Auteur
  • PersonId : 1161754
David Rohde
  • Fonction : Auteur
  • PersonId : 1053680

Résumé

Aggregating a dataset, then injecting some noise, is a simple and common way to release differentially private data. However, aggregated data -even without noise- is not an appropriate input for machine learning classifiers. In this work, we show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis. The resulting model is a Markov Random Field (MRF), and we detail how to apply, modify and scale a MRF training algorithm to our setting. Finally we present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
Fichier principal
Vignette du fichier
aggregates.pdf (594.24 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03770740 , version 1 (06-09-2022)
hal-03770740 , version 2 (04-10-2022)

Identifiants

  • HAL Id : hal-03770740 , version 2
  • ARXIV : 3796293

Citer

Alexandre Gilotte, Ahmed Ben Yamed, David Rohde. Learning from aggregated data with a maximum entropy model. 2022. ⟨hal-03770740v2⟩
35 Consultations
122 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More