#hardtoparse: POS Tagging and Parsing the Twitterverse - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2011

#hardtoparse: POS Tagging and Parsing the Twitterverse

Résumé

We evaluate the statistical dependency parser, Malt, on a new dataset of sentences taken from tweets. We use a version of Malt which is trained on gold standard phrase structure Wall Street Journal (WSJ) trees converted to Stanford labelled dependencies. We observe a drastic drop in performance moving from our in-domain WSJ test set to the new Twitter dataset, much of which has to do with the propagation of part-of-speech tagging er- rors. Retraining Malt on dependency trees produced by a state-of-the-art phrase structure parser, which has itself been self-trained on web material, results in a sig- nificant improvement. We analyse this improvement by examining in detail the effect of the retraining on indi- vidual dependency types.
Fichier principal
Vignette du fichier
aaai_mt_2011.pdf (211.6 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00702445 , version 1 (30-05-2012)

Identifiants

  • HAL Id : hal-00702445 , version 1

Citer

Jennifer Foster, Özlem Çetinoglu, Joachim Wagner, Joseph Le Roux, Stephen Hogan, et al.. #hardtoparse: POS Tagging and Parsing the Twitterverse. AAAI 2011 Workshop On Analyzing Microtext, 2011, United States. pp.20-25. ⟨hal-00702445⟩
569 Consultations
464 Téléchargements

Partager

Gmail Facebook X LinkedIn More