Skip to Main content Skip to Navigation
Journal articles

De la segmentation dans les tweets : signes de ponctuation, connecteurs, émoticônes et émojis

Abstract : In this paper, relying on a corpus of 3,444,075 tweets corresponding to 44 107 210 tokens (words, signs of punctuation, emojis, emoticons, etc.) collected in December 2016, we focus on segmentation processes at work in tweets. After mentioning some characteristics of these particular writings, we review the general segmentation processes in writing, punctuation and connectors. We then look at how these processes operate in tweets. Finally, we show that emoticons and emojis are specific processes allowing users to diversify their segmentation strategies (and other digital writings, such as SMS and email).
Document type :
Journal articles
Complete list of metadata

Cited literature [32 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02496765
Contributor : Pierre Halté <>
Submitted on : Tuesday, March 3, 2020 - 11:49:20 AM
Last modification on : Tuesday, May 12, 2020 - 3:56:13 PM
Long-term archiving on: : Thursday, June 4, 2020 - 1:37:21 PM

File

Magué Rossi-Gensane Halté Se...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02496765, version 1

Citation

Jean-Philippe Magué, Nathalie Rossi-Gensane, Pierre Halté. De la segmentation dans les tweets : signes de ponctuation, connecteurs, émoticônes et émojis. Corpus, Bases, Corpus, Langage - UMR 7320, 2020, Corpus complexes. Traitements, standardisation et analyse des corpus de communication médiée par les réseaux. ⟨hal-02496765⟩

Share

Metrics

Record views

100

Files downloads

271