Handling Normalization Issues for Part-of-Speech Tagging of Online Conversational Text

Abstract : For the purpose of POS tagging noisy user-generated text, should normalization be handled as a preliminary task or is it possible to handle misspelled words directly in the POS tagging model? We propose in this paper a combined approach where some errors are normalized before tagging, while a Gated Recurrent Unit deep neural network based tagger handles the remaining errors. Word embeddings are trained on a large corpus in order to address both normalization and POS tagging. Experiments are run on Contact Center chat conversations, a particular type of formal Computer Mediated Communication data.
Type de document :
Communication dans un congrès
Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 2018, Miyazaki, Japan. LREC proceedings
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01943391
Contributeur : Jeremy Auguste <>
Soumis le : lundi 3 décembre 2018 - 17:14:53
Dernière modification le : mardi 18 décembre 2018 - 08:04:05

Fichier

357.pdf
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

  • HAL Id : hal-01943391, version 1

Collections

Citation

Géraldine Damnati, Jeremy Auguste, Alexis Nasr, Delphine Charlet, Johannes Heinecke, et al.. Handling Normalization Issues for Part-of-Speech Tagging of Online Conversational Text. Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 2018, Miyazaki, Japan. LREC proceedings. 〈hal-01943391〉

Partager

Métriques

Consultations de la notice

18

Téléchargements de fichiers

9