Handling Normalization Issues for Part-of-Speech Tagging of Online Conversational Text

Abstract : For the purpose of POS tagging noisy user-generated text, should normalization be handled as a preliminary task or is it possible to handle misspelled words directly in the POS tagging model? We propose in this paper a combined approach where some errors are normalized before tagging, while a Gated Recurrent Unit deep neural network based tagger handles the remaining errors. Word embeddings are trained on a large corpus in order to address both normalization and POS tagging. Experiments are run on Contact Center chat conversations, a particular type of formal Computer Mediated Communication data.
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01943391
Contributor : Jeremy Auguste <>
Submitted on : Monday, December 3, 2018 - 5:14:53 PM
Last modification on : Tuesday, April 16, 2019 - 1:41:20 AM
Document(s) archivé(s) le : Monday, March 4, 2019 - 3:08:01 PM

File

357.pdf
Publisher files allowed on an open archive

Identifiers

  • HAL Id : hal-01943391, version 1

Collections

Citation

Géraldine Damnati, Jeremy Auguste, Alexis Nasr, Delphine Charlet, Johannes Heinecke, et al.. Handling Normalization Issues for Part-of-Speech Tagging of Online Conversational Text. Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 2018, Miyazaki, Japan. ⟨hal-01943391⟩

Share

Metrics

Record views

32

Files downloads

16