Improving generative statistical parsing with semi-supervised word clustering

Marie Candito 1 Benoît Crabbé 1
1 ALPAGE - Analyse Linguistique Profonde à Grande Echelle ; Large-scale deep linguistic processing
Inria Paris-Rocquencourt, UPD7 - Université Paris Diderot - Paris 7
Abstract : We present a semi-supervised method to improve statistical parsing performance. We focus on the well-known problem of lexical data sparseness and present experiments of word clustering prior to parsing. We use a combination of lexicon-aided morphological clustering that preserves tagging ambiguity, and unsupervised word clustering, trained on a large unannotated corpus. We apply these clusterings to the French Treebank, and we train a parser with the PCFG-LA unlexicalized algorithm of Petrov et al. (2006). We find a gain in French parsing performance: from a baseline of F1=86.76% to F1=87.37% using morphological clustering, and up to F1=88.29% using further unsupervised clustering. This is the best known score for French probabilistic parsing. These preliminary results are encouraging for statistically parsing morphologically rich languages, and languages with small amount of annotated data.
Type de document :
Communication dans un congrès
Association for Computational Linguistics. 11th International Conference on Parsing Technologies - IWPT'09, Oct 2009, Paris, France. pp.169-172, 2009
Liste complète des métadonnées

Littérature citée [12 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-00495267
Contributeur : Marie Candito <>
Soumis le : mardi 7 septembre 2010 - 15:46:14
Dernière modification le : vendredi 4 janvier 2019 - 17:33:24
Document(s) archivé(s) le : mercredi 8 décembre 2010 - 02:26:38

Fichier

IWPT09-clustering.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00495267, version 1

Collections

Citation

Marie Candito, Benoît Crabbé. Improving generative statistical parsing with semi-supervised word clustering. Association for Computational Linguistics. 11th International Conference on Parsing Technologies - IWPT'09, Oct 2009, Paris, France. pp.169-172, 2009. 〈hal-00495267〉

Partager

Métriques

Consultations de la notice

491

Téléchargements de fichiers

177