Skip to Main content Skip to Navigation
Directions of work or proceedings

Tweetaneuse : Fouille de motifs en caractères et plongement lexical à l’assaut du deft 2017

Davide Buscaldi 1 Aude Grezka 1 Gaël Lejeune 2
2 TALN
LINA - Laboratoire d'Informatique de Nantes Atlantique
Abstract : This articles describes the methods developed by the TWEETANEUSE team for the 2017 edition of the French text mining challenge (DEFT 2017). This year the challenge was dedicated to tweet classification : polarity detection and figurative language detection. The first method we designed relies on character-level patterns used as features for training a One VS Rest classifier. These patterns can be described as "frequent closed patterns without gap" in the sense of the data mining community, according to the text algorithmics community they are called maximal repeated strings. The two other methods use 13 features computed with lexical resources (FEEL, LabMT and a resource of our own). For one of these methods we added a bag of word representation of the tweets while for the other one a word embeddings representation has been added. The character-level method produced the best results in particular for the second task : figurative tweets detection.
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-02362125
Contributor : Aude Grezka <>
Submitted on : Wednesday, November 13, 2019 - 5:20:42 PM
Last modification on : Monday, June 8, 2020 - 12:58:08 AM

Identifiers

  • HAL Id : hal-02362125, version 1

Collections

Citation

Davide Buscaldi, Aude Grezka, Gaël Lejeune. Tweetaneuse : Fouille de motifs en caractères et plongement lexical à l’assaut du deft 2017. TALN 2017, Jun 2017, Orléans, France. pp. 65-76, 2017, Actes du 13e Défi Fouille de Texte (DEFT 2017). ⟨hal-02362125⟩

Share

Metrics

Record views

63