A Multilingual Evaluation for Online Hate Speech Detection

Michele Corazza; Stefano Menini; Elena Cabrio; Sara Tonelli; Serena Villata

doi:10.1145/3377323

Article Dans Une Revue ACM Transactions on Internet Technology Année : 2020

A Multilingual Evaluation for Online Hate Speech Detection

(1) , (2) , (3) , (2) , (4)

1
2
3
4

Michele Corazza

Fonction : Auteur

Alma Mater Studiorum Università di Bologna = University of Bologna

Stefano Menini

Fonction : Auteur

Fondazione Bruno Kessler [Trento, Italy]

Elena Cabrio

Fonction : Auteur

Laboratoire d'Informatique, Signaux, et Systèmes de Sophia Antipolis

Sara Tonelli

Fonction : Auteur

Fondazione Bruno Kessler [Trento, Italy]

Serena Villata

Fonction : Auteur
PersonId : 9409
IdHAL : serena-villata
ORCID : 0000-0003-3495-493X
IdRef : 200242911

Web-Instrumented Man-Machine Interactions, Communities and Semantics

Résumé

The increasing popularity of social media platforms like Twitter and Facebook has led to a rise in the presence of hate and aggressive speech on these platforms. Despite the number of approaches recently proposed in the Natural Language Processing research area for detecting these forms of abusive language, the issue of identifying hate speech at scale is still an unsolved problem. In this paper, we propose a robust neural architecture which is shown to perform in a satisfactory way across different languages, namely English, Italian and German. We address an extensive analysis of the obtained experimental results over the three languages to gain a better understanding of the contribution of the different components employed in the system, both from the architecture point of view (i.e., Long Short Term Memory, Gated Recurrent Unit, and bidirectional Long Short Term Memory) and from the feature selection point of view (i.e., ngrams, social network specific features, emotion lexica, emojis, word embeddings). To address such in-depth analysis, we use three freely available datasets for hate speech detection on social media on English, Italian and German.

Domaines

Informatique Intelligence artificielle [cs.AI]

Fichier principal

TOIT_CREEP_HAL.pdf (445.28 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Serena Villata : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02972184

Soumis le : mardi 20 octobre 2020-14:19:50

Dernière modification le : lundi 26 février 2024-11:22:07

Archivage à long terme le : jeudi 21 janvier 2021-18:51:27

Dates et versions

hal-02972184 , version 1 (20-10-2020)

Identifiants

HAL Id : hal-02972184 , version 1
DOI : 10.1145/3377323

Citer

Michele Corazza, Stefano Menini, Elena Cabrio, Sara Tonelli, Serena Villata. A Multilingual Evaluation for Online Hate Speech Detection. ACM Transactions on Internet Technology, 2020, 20 (2), pp.1-22. ⟨10.1145/3377323⟩. ⟨hal-02972184⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA I3S WIMMICS INRIA2 UNIV-COTEDAZUR 3IA-COTEDAZUR ANR

384 Consultations

1439 Téléchargements

A Multilingual Evaluation for Online Hate Speech Detection

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager