| HAL : hal-00702445, version 1 |
| Fiche détaillée | Récupérer au format |
|
|
| AAAI 2011 Workshop On Analyzing Microtext, États-Unis (2011) |
|
|
|
|
| #hardtoparse: POS Tagging and Parsing the Twitterverse |
|
|
| Jennifer Foster 1Özlem Çetinoglu 1 |
|
|
| (2011) |
|
|
| We evaluate the statistical dependency parser, Malt, on a new dataset of sentences taken from tweets. We use a version of Malt which is trained on gold standard phrase structure Wall Street Journal (WSJ) trees converted to Stanford labelled dependencies. We observe a drastic drop in performance moving from our in-domain WSJ test set to the new Twitter dataset, much of which has to do with the propagation of part-of-speech tagging er- rors. Retraining Malt on dependency trees produced by a state-of-the-art phrase structure parser, which has itself been self-trained on web material, results in a sig- nificant improvement. We analyse this improvement by examining in detail the effect of the retraining on indi- vidual dependency types. |
|
|
|
|
|
|
|
|
|
|
| 1 : | National Centre for Language Technology (NCLT) |
| Dublin City University | |
| 2 : | Laboratoire d'informatique Fondamentale de Marseille (LIF) |
| CNRS : UMR6166 – Université de la Méditerranée - Aix-Marseille II – Université de Provence - Aix-Marseille I | |
| 3 : | Uppsala University |
| Uppsala University | |
|
|
|
|
|
|
|
|
| Domaine | : | Informatique/Informatique et langage |
|
|
| Liste des fichiers attachés à ce document : | |||||
|
|
|
| hal-00702445, version 1 | |
| http://hal.archives-ouvertes.fr/hal-00702445 | |
| oai:hal.archives-ouvertes.fr:hal-00702445 | |
| Contributeur : Joseph Le Roux | |
| Soumis le : Mercredi 30 Mai 2012, 11:45:09 | |
| Dernière modification le : Vendredi 29 Juin 2012, 16:11:30 | |