Unsupervised Concept Annotation using Latent Dirichlet Allocation and Segmental Methods

Abstract : Training efficient statistical approaches for natural language understanding generally requires data with segmental semantic annotations. Unfortunately, building such resources is costly. In this paper, we propose an approach that produces annotations in an unsu-pervised way. The first step is an implementation of latent Dirichlet allocation that produces a set of topics with probabilities for each topic to be associated with a word in a sentence. This knowledge is then used as a bootstrap to infer a segmentation of a word sentence into topics using either integer linear optimisation or stochastic word alignment models (IBM models) to produce the final semantic annotation. The relation between automatically-derived topics and task-dependent concepts is evaluated on a spoken dialogue task with an available reference annotation.
Type de document :
Communication dans un congrès
EMNLP 2011, Conference on Empirical Methods in Natural Language Processing, Jul 2011, Edimbourg, United Kingdom
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01314555
Contributeur : Bibliothèque Universitaire Déposants Hal-Avignon <>
Soumis le : mercredi 11 mai 2016 - 15:44:36
Dernière modification le : lundi 26 novembre 2018 - 16:12:03

Identifiants

  • HAL Id : hal-01314555, version 1

Collections

Citation

Nathalie Camelin, Boris Detienne, Stéphane Huet, Dominique Quadri, Fabrice Lefèvre. Unsupervised Concept Annotation using Latent Dirichlet Allocation and Segmental Methods. EMNLP 2011, Conference on Empirical Methods in Natural Language Processing, Jul 2011, Edimbourg, United Kingdom. 〈hal-01314555〉

Partager

Métriques

Consultations de la notice

159