4485 articles – 13230 Notices  [english version]
HAL : hal-00702445, version 1

Fiche détaillée  Récupérer au format
AAAI 2011 Workshop On Analyzing Microtext, États-Unis (2011)
#hardtoparse: POS Tagging and Parsing the Twitterverse
Jennifer Foster 1, Özlem Çetinoglu 1, Joachim Wagner 1, Joseph Le Roux 2, Stephen Hogan 1, Joakim Nivre 3, Deirdre Hogan 1, Josef Van Genabith 1
(2011)

We evaluate the statistical dependency parser, Malt, on a new dataset of sentences taken from tweets. We use a version of Malt which is trained on gold standard phrase structure Wall Street Journal (WSJ) trees converted to Stanford labelled dependencies. We observe a drastic drop in performance moving from our in-domain WSJ test set to the new Twitter dataset, much of which has to do with the propagation of part-of-speech tagging er- rors. Retraining Malt on dependency trees produced by a state-of-the-art phrase structure parser, which has itself been self-trained on web material, results in a sig- nificant improvement. We analyse this improvement by examining in detail the effect of the retraining on indi- vidual dependency types.
1 :  National Centre for Language Technology (NCLT)
Dublin City University
2 :  Laboratoire d'informatique Fondamentale de Marseille (LIF)
CNRS : UMR6166 – Université de la Méditerranée - Aix-Marseille II – Université de Provence - Aix-Marseille I
3 :  Uppsala University
Uppsala University
Informatique/Informatique et langage
Liste des fichiers attachés à ce document : 
PDF
aaai_mt_2011.pdf(239.8 KB)