Skip to Main content Skip to Navigation
Conference papers

Understanding Social Media Texts with Minimum Human Effort on #Twitter

Abstract : Named Entity Recognition (NER) is a traditional Natural Language Processing (NLP) task. But traditional machine learning methods are facing new problems to handle this task with Social Media data like Twitter. In this new context, the performance is often degraded. The Twitter messages have particular features. Consider the example "Today wasz Fun cusz anna Came juss for me <3: hahaha". In this example, the difficulties are manifold: 1) Spelling mistakes: wasz (was), cusz (because), juss (just); 2) Uppercase/lowercase inversion: Fun (fun), 3) anna (Anna), Came (came); 4) Emoticon: <3; 5) Interjection: hahaha. The alternation of uppercase/lowercase is a major problem for the NER task because the only person proper noun "anna" of our tweet begins with a lowercase instead of an uppercase, like in grammatically well-formed texts. In this paper, we present our work on recognizing named entities on Twitter.
Complete list of metadata

Cited literature [5 references]  Display  Hide  Download
Contributor : Marco Dinarelli Connect in order to contact the contributor
Submitted on : Tuesday, March 14, 2017 - 6:09:04 PM
Last modification on : Friday, October 15, 2021 - 1:40:08 PM
Long-term archiving on: : Thursday, June 15, 2017 - 3:13:08 PM


Files produced by the author(s)


  • HAL Id : hal-01490018, version 1



Tian Tian, Isabelle Tellier, Marco Dinarelli, Pedro Cardoso. Understanding Social Media Texts with Minimum Human Effort on #Twitter. Language and the new (instant) media (PLIN), May 2016, Louvain-la-Neuve, Belgium. ⟨hal-01490018⟩



Record views


Files downloads