A French Corpus for Semantic Similarity - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

A French Corpus for Semantic Similarity

Rémi Cardon
  • Fonction : Auteur
  • PersonId : 184596
  • IdHAL : remi-cardon

Résumé

Semantic similarity is an area of Natural Language Processing that is useful for several downstream applications, such as machine translation, natural language generation, information retrieval, or question answering. The task consists in assessing the extent to which two sentences express or do not express the same meaning. To do so, corpora with graded pairs of sentences are required. The grade is positioned on a given scale, usually going from 0 (completely unrelated) to 5 (equivalent semantics). In this work, we introduce such a corpus for French, the first that we know of. It is comprised of 1,010 sentence pairs with grades from five annotators. We describe the annotation process, analyse these data, and perform a few experiments for the automatic grading of semantic similarity.
Fichier principal
Vignette du fichier
cardon-LREC2020.pdf (109.17 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03095142 , version 1 (04-01-2021)

Identifiants

  • HAL Id : hal-03095142 , version 1

Citer

Rémi Cardon, Natalia Grabar. A French Corpus for Semantic Similarity. LREC 12th Edition of its Language Resources and Evaluation Conference., May 2020, Marseille, France. ⟨hal-03095142⟩
76 Consultations
326 Téléchargements

Partager

Gmail Facebook X LinkedIn More