Using Elasticsearch for Linguistic Analysis of Tweets in Time and Space - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2018

Using Elasticsearch for Linguistic Analysis of Tweets in Time and Space

Antonio Ruiz Tinoco
  • Fonction : Auteur
  • PersonId : 1032327

Résumé

The collection and analysis of microtexts is both straightforward from a computational viewpoint and complex in a scientific perspective, they often feature non-standard data and are accompanied by a profusion of metadata. We address corpus construction and visualization issues in order to study spontaneous speech and variation through short messages. To this end, we introduce an experimental setting based on a generic NoSQL database (Elasticsearch) and its front-end (Kibana). We focus on Spanish and German and present concrete examples of faceted searches on short messages coming from the Twitter platform. The results are discussed with a particular emphasis on the impact of querying and visualization techniques first for longitudinal studies in the course of time and second for results aggregated in a spatial perspective.
Fichier principal
Vignette du fichier
Barbaresi-Ruiz-Tinoco_Elasticsearch-Tweets_CMLC2018.pdf (691.85 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01798706 , version 1 (23-05-2018)

Identifiants

  • HAL Id : hal-01798706 , version 1

Citer

Adrien Barbaresi, Antonio Ruiz Tinoco. Using Elasticsearch for Linguistic Analysis of Tweets in Time and Space. LREC 2018, May 2018, Miyazaki, Japan. pp.14-19. ⟨hal-01798706⟩

Collections

GENCI
177 Consultations
516 Téléchargements

Partager

Gmail Facebook X LinkedIn More