Multiword Expression Features for Automatic Hate Speech Detection

Nicolas Zampieri; Irina Illina; Dominique Fohr

Communication Dans Un Congrès Année : 2021

Multiword Expression Features for Automatic Hate Speech Detection

(1) , (1) , (1)

Nicolas Zampieri

Fonction : Auteur
PersonId : 1089106

Speech Modeling for Facilitating Oral-Based Communication

Irina Illina

Fonction : Auteur
PersonId : 15663
IdHAL : irina-illina
IdRef : 120731746

Speech Modeling for Facilitating Oral-Based Communication

Dominique Fohr

Fonction : Auteur
PersonId : 15652
IdHAL : dominique-fohr
IdRef : 031092942

Speech Modeling for Facilitating Oral-Based Communication

Résumé

The task of automatically detecting hate speech in social media is gaining more and more attention. Given the enormous volume of content posted daily, human monitoring of hate speech is unfeasible. In this work, we propose new word-level features for automatic hate speech detection (HSD): multiword expressions (MWEs). MWEs are lexical units greater than a word that have idiomatic and compositional meanings. We propose to integrate MWE features in a deep neural network-based HSD framework. Our baseline HSD system relies on Universal Sentence Encoder (USE). To incorporate MWE features, we create a three-branch deep neural network: one branch for USE, one for MWE categories, and one for MWE embeddings. We conduct experiments on two hate speech tweet corpora with different MWE categories and with two types of MWE embeddings, word2vec and BERT. Our experiments demonstrate that the proposed HSD system with MWE features significantly outperforms the baseline system in terms of macro-F1.

Mots clés

Deep learning Hate speech detection Social media

Domaines

Informatique [cs] Traitement du texte et du document Réseaux sociaux et d'information [cs.SI]

Fichier principal

NLDB_2021___Multiword_Expression_Features_for_Automatic_Hate_Speech_Detection__short_paper___.pdf (135.1 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Nicolas Zampieri : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03231047

Soumis le : vendredi 21 mai 2021-09:31:36

Dernière modification le : lundi 11 septembre 2023-17:41:19

Archivage à long terme le : dimanche 22 août 2021-18:13:30

Dates et versions

hal-03231047 , version 1 (21-05-2021)

Identifiants

HAL Id : hal-03231047 , version 1

Citer

Nicolas Zampieri, Irina Illina, Dominique Fohr. Multiword Expression Features for Automatic Hate Speech Detection. NLDB 2021 - 26th International Conference on Natural Language & Information Systems, Jun 2021, Saarbrücken/Virtual, Germany. ⟨hal-03231047⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LORRAINE INRIA2 LORIA LORIA-NLPKD

83 Consultations

238 Téléchargements

Multiword Expression Features for Automatic Hate Speech Detection

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager