Unsupervised Concept Annotation Using Latent Dirichlet Allocation and Segmental Methods

Nathalie Camelin; Boris Detienne; Stéphane Huet; Dominique Quadri; Fabrice Lefèvre

Communication Dans Un Congrès Année : 2011

Unsupervised Concept Annotation Using Latent Dirichlet Allocation and Segmental Methods

(1) , (1) , (1) , (1) , (1)

Nathalie Camelin

Fonction : Auteur
PersonId : 13884
IdHAL : nathalie-camelin
ORCID : 0000-0001-6786-1372
IdRef : 121979318

Laboratoire Informatique d'Avignon

Boris Detienne

Fonction : Auteur
PersonId : 11498
IdHAL : boris-detienne
ORCID : 0000-0003-0554-1407
IdRef : 120195828

Laboratoire Informatique d'Avignon

Stéphane Huet

Fonction : Auteur
PersonId : 10005
IdHAL : shuet
ORCID : 0000-0003-1838-3807
IdRef : 110355245

Laboratoire Informatique d'Avignon

Dominique Quadri

Fonction : Auteur
PersonId : 964572

Laboratoire Informatique d'Avignon

Fabrice Lefèvre

Fonction : Auteur
PersonId : 175133
IdHAL : fabricelefevre
IdRef : 089427092

Laboratoire Informatique d'Avignon

Résumé

Training efficient statistical approaches for natural language understanding generally requires data with segmental semantic annotations. Unfortunately, building such resources is costly. In this paper, we propose an approach that produces annotations in an unsu-pervised way. The first step is an implementation of latent Dirichlet allocation that produces a set of topics with probabilities for each topic to be associated with a word in a sentence. This knowledge is then used as a bootstrap to infer a segmentation of a word sentence into topics using either integer linear optimisation or stochastic word alignment models (IBM models) to produce the final semantic annotation. The relation between automatically-derived topics and task-dependent concepts is evaluated on a spoken dialogue task with an available reference annotation.

Domaines

Informatique et langage [cs.CL] Intelligence artificielle [cs.AI]

Fichier principal

UNSUP11b.pdf (313.92 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

bibliothèque Universitaire Déposants HAL-Avignon : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01314555

Soumis le : mercredi 27 février 2019-14:01:19

Dernière modification le : lundi 16 novembre 2020-11:58:02

Dates et versions

hal-01314555 , version 1 (27-02-2019)

Identifiants

HAL Id : hal-01314555 , version 1

Citer

Nathalie Camelin, Boris Detienne, Stéphane Huet, Dominique Quadri, Fabrice Lefèvre. Unsupervised Concept Annotation Using Latent Dirichlet Allocation and Segmental Methods. EMNLP Workshop on Unsupervised Learning in NLP (UNSUP), Jul 2011, Edinburgh, United Kingdom. pp.72-81. ⟨hal-01314555⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-AVIGNON LIA

189 Consultations

20 Téléchargements

Unsupervised Concept Annotation Using Latent Dirichlet Allocation and Segmental Methods

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager