Adapting WSJ-trained parsers to the British National Corpus using in-domain self-training - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2007

Adapting WSJ-trained parsers to the British National Corpus using in-domain self-training

Résumé

We introduce a set of 1,000 gold standard parse trees for the British National Corpus (BNC) and perform a series of self-training experiments with Charniak and Johnson's reranking parser and BNC sentences. We show that retraining this parser with a combination of one million BNC parse trees (produced by the same parser) and the original WSJ training data yields improvements of 0.4% on WSJ Section 23 and 1.7% on the new BNC gold standard set.
Fichier principal
Vignette du fichier
jfoster_et_al_07.pdf (28.79 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

inria-00545429 , version 1 (10-12-2010)

Identifiants

  • HAL Id : inria-00545429 , version 1

Citer

Jennifer Foster, Joachim Wagner, Djamé Seddah, Josef van Genabith. Adapting WSJ-trained parsers to the British National Corpus using in-domain self-training. Proceedings of the 10th International Conference on Parsing Technologies : IWPT '07, Association for Computational Linguistics, 2007, Prague, Czech Republic. pp.33--35. ⟨inria-00545429⟩
58 Consultations
142 Téléchargements

Partager

Gmail Facebook X LinkedIn More