Modular resource development and diagnostic evaluation framework for fast NLP system improvement - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2009

Modular resource development and diagnostic evaluation framework for fast NLP system improvement

Résumé

Natural Language Processing systems are large-scale softwares, whose development involves many man-years of work, in terms of both coding and resource development. Given a dictionary of 110k lemmas, a few hundred syntactic analysis rules, 20k ngrams matrices and other resources, what will be the impact on a syntactic analyzer of adding a new possible category to a given verb? What will be the consequences of a new syntactic rules addition? Any modification may imply, besides what was expected, unforeseeable side-effects and the complexity of the system makes it difficult to guess the overall impact of even small changes. We present here a framework designed to effectively and iteratively improve the accuracy of our linguistic analyzer LIMA by iterative refinements of its linguistic resources. These improvements are continuously assessed by evaluating the analyzer performance against a reference corpus. Our first results show that this framework is really helpful towards this goal.

Mots clés

Fichier principal
Vignette du fichier
ResourceDevelopmentDiagnosticFrameworkNLPImprovement-DechalendarNouvel-2009.pdf (247.36 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00568775 , version 1 (23-02-2011)

Identifiants

  • HAL Id : hal-00568775 , version 1

Citer

Gaël de Chalendar, Damien Nouvel. Modular resource development and diagnostic evaluation framework for fast NLP system improvement. North American Chapter of the Association for Computational Linguistics - Human Language Technologies 2009, May 2009, Boulder, United States. ⟨hal-00568775⟩
77 Consultations
213 Téléchargements

Partager

Gmail Facebook X LinkedIn More