Skip to Main content Skip to Navigation
Journal articles

Lexical Normalization of Spanish Tweets with Rule-Based Components and Language Models

Abstract : This paper presents a system to normalize Spanish tweets, which uses preprocessing rules, a domain-appropriate edit-distance model, and language models to select correction candidates based on context. The system is an improvement on the tool we submitted to the Tweet-Norm 2013 shared task, and results on the task's test-corpus are above-average. Additionally, we provide a study of the impact for tweet normalization of the different components of the system: rule-based, edit-distance based and statistical.
Complete list of metadatas

Cited literature [18 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01099241
Contributor : Pablo Ruiz Fabo <>
Submitted on : Thursday, January 1, 2015 - 8:34:41 PM
Last modification on : Wednesday, April 22, 2020 - 4:56:03 PM
Document(s) archivé(s) le : Thursday, April 2, 2015 - 10:07:07 AM

File

4902-4167-1-PB.pdf
Publisher files allowed on an open archive

Identifiers

  • HAL Id : hal-01099241, version 1

Collections

Citation

Pablo Ruiz, Montse Cuadros, Thierry Etchegoyhen. Lexical Normalization of Spanish Tweets with Rule-Based Components and Language Models. Procesamiento del Lenguaje Natural, Sociedad Espanola para el Procesamiento del Lenguaje Natural, 2014, pp.8. ⟨hal-01099241⟩

Share

Metrics

Record views

215

Files downloads

230