Identification Semi-Automatique de Mots-Germes pour l'Analyse de Sentiments et son Intensité

Amal Htait; Sébastien Fournier; Patrice Bellot

Communication Dans Un Congrès Année : 2017

Semi-Automatic of Germ-Words Identification for Sentiment Analysis and Intensity

Identification Semi-Automatique de Mots-Germes pour l'Analyse de Sentiments et son Intensité

(1, 2) , , (3, 4)

1
2
3
4

Amal Htait

Fonction : Auteur
PersonId : 18012
IdHAL : amal-htait

Centre pour l'édition électronique ouverte

Laboratoire d'Informatique et des Systèmes (LIS) (Marseille, Toulon)

Sébastien Fournier

Fonction : Auteur
PersonId : 170902
IdHAL : sebastien-fournier
ORCID : 0000-0002-1611-0744
IdRef : 095597050

Patrice Bellot

Fonction : Auteur
PersonId : 14204
IdHAL : patrice-bellot
ORCID : 0000-0001-8698-5055
IdRef : 079380956

Data, Information & content MAnagement Group

Laboratoire des Sciences de l'Information et des Systèmes

Résumé

For the purpose of opinion exploring in tweets, this article presents a sentiment classification of tweets content. First, we present a method to identify new sentiment similarity seed words. These seed words are used for predicting sentiment intensity of other words and short phrases in co-occurrence. Then, for testing sentiment similarity, we use: Similarity Measures methods between words and cosine similarity measure between the word embedding representations (e.g. word2vec, GloVE). The experiments results highlight the importance of adapted for tweets seed words. In addition of the corpora size and its pre-treatement. As a conclusion, best results were achieved using cosine similarity measure between the word embedding representations.

Dans le but d'exploiter les opinions dans les tweets, cet article présente une classification à partir du sentiment contenu au sein des tweets. Nous présentons une méthode d'identifi-cation de nouveaux mots-germes. Ils sont utilisés pour la prédiction de l'intensité de sentiments des mots en co-occurrence avec ces mots-germes. Ensuite, le calcul de similarités entre sentiments est appliqué en utilisant: la mesure de la similarité entre deux mots et l'utilisation de plongement de mots (e.g. word2vec, GloVE) couplé à la mesure cosinus. Les résultats montrent l'importance de l'utilisation de mots-germes adaptés aux tweets, ainsi que la taille et le prétrai-tement de corpus. Pour conclure, nous avons obtenu les meilleurs résultats grâce à l'application de la méthode utilisant le plongement de mots couplée à la mesure cosinus. ABSTRACT. For the purpose of opinion exploring in tweets, this article presents a sentiment classification of tweets content. First, we present a method to identify new sentiment similarity seed words. These seed words are used for predicting sentiment intensity of other words and short phrases in co-occurrence. Then, for testing sentiment similarity, we use: Similarity Measures methods between words and cosine similarity measure between the word embedding representations (e.g. word2vec, GloVE). The experiments results highlight the importance of adapted for tweets seed words. In addition of the corpora size and its pre-treatement. As a conclusion, best results were achieved using cosine similarity measure between the word embedding representations. MOTS-CLÉS : Mots-germes, Twitter, Mesure de la Similarité, Plongement de mot, Word2vec, GloVe.

Mots clés

Seed words Twitter Similarity Measures Word Embedding Word2vec GloVe 2

Domaines

Informatique [cs] Traitement du texte et du document Informatique et langage [cs.CL] Recherche d'information [cs.IR]

Fichier principal

RJC_2017_AmalHtait.pdf (323.66 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Amal HTAIT : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01771644

Soumis le : jeudi 19 avril 2018-16:27:25

Dernière modification le : vendredi 22 mars 2024-18:24:03

Archivage à long terme le : mardi 18 septembre 2018-16:04:29

Dates et versions

hal-01771644 , version 1 (19-04-2018)

Identifiants

HAL Id : hal-01771644 , version 1

Citer

Amal Htait, Sébastien Fournier, Patrice Bellot. Identification Semi-Automatique de Mots-Germes pour l'Analyse de Sentiments et son Intensité. CORIA, Mar 2017, Marseille, France. ⟨hal-01771644⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-AVIGNON UNIV-TLN CNRS UNIV-AMU EHESS LIS-LAB HESAM IRENAV LAMPA LCPI LABOMAP LISPEN MSMP

190 Consultations

121 Téléchargements

Semi-Automatic of Germ-Words Identification for Sentiment Analysis and Intensity

Identification Semi-Automatique de Mots-Germes pour l'Analyse de Sentiments et son Intensité

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager