Skip to Main content Skip to Navigation
Conference papers

Understanding Social Media Texts with Minimum Human Effort on #Twitter

Abstract : Named Entity Recognition (NER) is a traditional Natural Language Processing (NLP) task. But traditional machine learning methods are facing new problems to handle this task with Social Media data like Twitter. In this new context, the performance is often degraded. The Twitter messages have particular features. Consider the example "Today wasz Fun cusz anna Came juss for me <3: hahaha". In this example, the difficulties are manifold: 1) Spelling mistakes: wasz (was), cusz (because), juss (just); 2) Uppercase/lowercase inversion: Fun (fun), 3) anna (Anna), Came (came); 4) Emoticon: <3; 5) Interjection: hahaha. The alternation of uppercase/lowercase is a major problem for the NER task because the only person proper noun "anna" of our tweet begins with a lowercase instead of an uppercase, like in grammatically well-formed texts. In this paper, we present our work on recognizing named entities on Twitter.
Complete list of metadatas

Cited literature [5 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01490018
Contributor : Marco Dinarelli <>
Submitted on : Tuesday, March 14, 2017 - 6:09:04 PM
Last modification on : Thursday, April 2, 2020 - 1:28:58 PM
Document(s) archivé(s) le : Thursday, June 15, 2017 - 3:13:08 PM

File

PLIN2016.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01490018, version 1

Collections

Citation

Tian Tian, Isabelle Tellier, Marco Dinarelli, Pedro Cardoso. Understanding Social Media Texts with Minimum Human Effort on #Twitter. Language and the new (instant) media (PLIN), May 2016, Louvain-la-Neuve, Belgium. ⟨hal-01490018⟩

Share

Metrics

Record views

257

Files downloads

74