C-structures and f-structures for the British National Corpus - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2007

C-structures and f-structures for the British National Corpus

Résumé

We describe how the British National Corpus (BNC), a one hundred million word balanced corpus of British English, was parsed into Lexical Functional Grammar (LFG) c-structures and f-structures, using a treebank-based parsing architecture. The parsing architecture uses a state-of-the-art statistical parser and reranker trained on the Penn Treebank to produce context-free phrase structure trees, and an annotation algorithm to automatically annotate these trees into LFG f-structures. We describe the pre-processing steps which were taken to accommodate the differences between the Penn Treebank and the BNC. Some of the issues encountered in applying the parsing architecture on such a large scale are discussed. The process of annotating a gold standard set of 1,000 parse trees is described. We present evaluation results obtained by evaluating the c-structures produced by the statistical parser against the c-structure gold standard. We also present the results obtained by evaluating the f-structures produced by the annotation algorithm against an automatically constructed f-structure gold standard. The c-structures achieve an f-score of 83.7% and the f-structures an f-score of 91.2%.
Fichier principal
Vignette du fichier
lfg07wagneretal.pdf (112.54 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

inria-00545440 , version 1 (10-12-2010)

Identifiants

  • HAL Id : inria-00545440 , version 1

Citer

Joachim Wagner, Djamé Seddah, Jennifer Foster, Josef van Genabith. C-structures and f-structures for the British National Corpus. Proceedings of the Twelfth International Lexical Functional Grammar Conference, 2007, Stanford, CA, United States. ⟨inria-00545440⟩
99 Consultations
294 Téléchargements

Partager

Gmail Facebook X LinkedIn More