#hardtoparse: POS Tagging and Parsing the Twitterverse

Jennifer Foster; Özlem Çetinoglu; Joachim Wagner; Joseph Le Roux; Stephen Hogan; Joakim Nivre; Deirdre Hogan; Josef van Genabith

Communication Dans Un Congrès Année : 2011

#hardtoparse: POS Tagging and Parsing the Twitterverse

(1) , (1) , (1) , (2) , (1) , (3) , (1) , (1)

1
2
3

Jennifer Foster

Fonction : Auteur

National Centre for Language Technology

Özlem Çetinoglu

Fonction : Auteur

National Centre for Language Technology

Joachim Wagner

Fonction : Auteur

National Centre for Language Technology

Joseph Le Roux

Fonction : Auteur
PersonId : 1192450
IdHAL : joseph-le-roux
ORCID : 0000-0002-3889-8536

Laboratoire d'informatique Fondamentale de Marseille - UMR 6166

Stephen Hogan

Fonction : Auteur

National Centre for Language Technology

Joakim Nivre

Fonction : Auteur
PersonId : 878440

Uppsala University

Deirdre Hogan

Fonction : Auteur

National Centre for Language Technology

Josef van Genabith

Fonction : Auteur

National Centre for Language Technology

Résumé

We evaluate the statistical dependency parser, Malt, on a new dataset of sentences taken from tweets. We use a version of Malt which is trained on gold standard phrase structure Wall Street Journal (WSJ) trees converted to Stanford labelled dependencies. We observe a drastic drop in performance moving from our in-domain WSJ test set to the new Twitter dataset, much of which has to do with the propagation of part-of-speech tagging er- rors. Retraining Malt on dependency trees produced by a state-of-the-art phrase structure parser, which has itself been self-trained on web material, results in a sig- nificant improvement. We analyse this improvement by examining in detail the effect of the retraining on indi- vidual dependency types.

Domaines

Informatique et langage [cs.CL]

Fichier principal

aaai_mt_2011.pdf (211.6 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Joseph Le Roux : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00702445

Soumis le : mercredi 30 mai 2012-11:45:09

Dernière modification le : vendredi 24 mars 2023-14:52:55

Archivage à long terme le : jeudi 15 décembre 2016-10:12:08

Dates et versions

hal-00702445 , version 1 (30-05-2012)

Identifiants

HAL Id : hal-00702445 , version 1

Citer

Jennifer Foster, Özlem Çetinoglu, Joachim Wagner, Joseph Le Roux, Stephen Hogan, et al.. #hardtoparse: POS Tagging and Parsing the Twitterverse. AAAI 2011 Workshop On Analyzing Microtext, 2011, United States. pp.20-25. ⟨hal-00702445⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

LIF CNRS UNIV-AMU LIS-LAB ANR

569 Consultations

464 Téléchargements

#hardtoparse: POS Tagging and Parsing the Twitterverse

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager