#hardtoparse: POS Tagging and Parsing the Twitterverse
Résumé
We evaluate the statistical dependency parser, Malt, on a new dataset of sentences taken from tweets. We use a version of Malt which is trained on gold standard phrase structure Wall Street Journal (WSJ) trees converted to Stanford labelled dependencies. We observe a drastic drop in performance moving from our in-domain WSJ test set to the new Twitter dataset, much of which has to do with the propagation of part-of-speech tagging er- rors. Retraining Malt on dependency trees produced by a state-of-the-art phrase structure parser, which has itself been self-trained on web material, results in a sig- nificant improvement. We analyse this improvement by examining in detail the effect of the retraining on indi- vidual dependency types.
Domaines
Informatique et langage [cs.CL]
Origine : Fichiers produits par l'(les) auteur(s)
Loading...