Decentralized Topic Modelling with Latent Dirichlet Allocation

Igor Colin 1 Christophe Dupuy 2, 3
3 SIERRA - Statistical Machine Learning and Parsimony
DI-ENS - Département d'informatique de l'École normale supérieure, CNRS - Centre National de la Recherche Scientifique, Inria de Paris
Abstract : Privacy preserving networks can be modelled as decentralized networks (e.g., sensors , connected objects, smartphones), where communication between nodes of the network is not controlled by a master or central node. For this type of networks, the main issue is to gather/learn global information on the network (e.g., by optimizing a global cost function) while keeping the (sensitive) information at each node. In this work, we focus on text information that agents do not want to share (e.g., , text messages, emails, confidential reports). We use recent advances on decentralized optimization and topic models to infer topics from a graph with limited communication. We propose a method to adapt latent Dirichlet allocation (LDA) model to decentralized optimization and show on synthetic data that we still recover similar parameters and similar performance at each node than with stochastic methods accessing to the whole information in the graph.
Type de document :
Communication dans un congrès
NIPS 2016 - 30th Conference on Neural Information Processing Systems, Dec 2016, Barcelone, Spain
Liste complète des métadonnées

Littérature citée [17 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01383111
Contributeur : Christophe Dupuy <>
Soumis le : mardi 18 octobre 2016 - 10:18:10
Dernière modification le : jeudi 26 avril 2018 - 10:29:05

Fichier

ColinDupuy2016.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01383111, version 1
  • ARXIV : 1610.01417

Citation

Igor Colin, Christophe Dupuy. Decentralized Topic Modelling with Latent Dirichlet Allocation. NIPS 2016 - 30th Conference on Neural Information Processing Systems, Dec 2016, Barcelone, Spain. 〈hal-01383111〉

Partager

Métriques

Consultations de la notice

329

Téléchargements de fichiers

112