4464 articles – 13151 references  [version française]
HAL: inria-00617068, version 1

See detailed view  BibTeX,EndNote,...
TALN'2011 - Traitement Automatique des Langues Naturelles, Montpellier : France (2011)
Coopération de méthodes statistiques et symboliques pour l'adaptation non-supervisée d'un système d'étiquetage en entités nommées
Frédéric Béchet 1, Benoît Sagot 2, Rosa Stern 2, 3
(2011)

Named entity recognition and typing is achieved both by symbolic and probabilistic systems. We report on an experiment for making the rule-based system NP, a high-precision system developed on AFP news corpora and relies on the Aleda named entity database, interact with LIANE, a high-recall probabilistic system trained on oral transcriptions from the ESTER corpus. We show that a probabilistic system such as LIANE can be adapted to a new type of corpus in a non-supervized way thanks to large-scale corpora automatically annotated by NP. This adaptation does not require any additional manual anotation and illustrates the complementarity between numeric and symbolic techniques for tackling linguistic tasks.
1:  Laboratoire d'informatique Fondamentale de Marseille (LIF)
CNRS : UMR6166 – Université de la Méditerranée - Aix-Marseille II – Université de Provence - Aix-Marseille I
2:  ALPAGE (INRIA Rocquencourt)
INRIA – Université Paris VII - Paris Diderot
3:  Medialab AFP (Medialab AFP)
Agence France-Presse
Computer Science/Computation and Language
Named entity recognition – domain adaptation – cooperation between probabilistic and symbolic approaches
Attached file list to this document: 
PDF
taln11entnom.pdf(196.9 KB)