Les Entitées Nommées, de la linguistique au TAL : Statut théorique et méthodes de désambiguïsation

Abstract : Introduced as part of the last Message Understanding Conferences dedicated to information extraction, Named Entity extraction is a well-studied task in Natural Language Processing. The recognition and the categorization of person names, location names, organisation names, etc. is regarded as a fundamental process for a wide variety of natural language processing applications dealing with content analysis and many research works are devoted to it, achieving very good results. Following this success, named entity treatment is moving towards new research prospects with, among others, disambiguation and fined-grained annotation. However, this new challenges make even more crucial the question of named entity definition, which was not much discussed until now. Two main lines were explored during this PhD project : first we tried to propose a definition of named entities and then we experimented disambiguation methods. After a presentation and a state of the art of the named entity recognition task, we had to examine, from a methodological point of view, how to tackle the question of the definition of named entities. Our approach led us to study, firstly, the linguistic side, with proper names and definite descriptions and, secondly, the computing side, this development aiming at, finally, proposing a named entity definition that takes into account language aspects but also computer systems capacities and requirements. The continuation of the dissertation is about more experimental works, with a presentation of experiments about fined-grained named entity annotation and metonymy resolution methods.
Liste complète des métadonnées

Cited literature [114 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/tel-01639190
Contributor : Maud Ehrmann <>
Submitted on : Monday, November 20, 2017 - 12:57:49 PM
Last modification on : Tuesday, February 12, 2019 - 10:30:06 AM
Document(s) archivé(s) le : Wednesday, February 21, 2018 - 4:02:17 PM

File

2008-065.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : tel-01639190, version 1

Collections

Citation

Maud Ehrmann. Les Entitées Nommées, de la linguistique au TAL : Statut théorique et méthodes de désambiguïsation. Informatique et langage [cs.CL]. Paris Diderot University, 2008. Français. ⟨tel-01639190⟩

Share

Metrics

Record views

407

Files downloads

498