TermEval 2020: TALN-LS2N System for Automatic Term Extraction - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

TermEval 2020: TALN-LS2N System for Automatic Term Extraction

Amir Hazem
  • Fonction : Auteur
  • PersonId : 1103522
Mérième Bouhandi
  • Fonction : Auteur
  • PersonId : 1103523
Florian Boudin
  • Fonction : Auteur
  • PersonId : 1103524

Résumé

Automatic terminology extraction is a notoriously difficult task aiming to ease effort demanded to manually identify terms in domain-specific corpora by automatically providing a ranked list of candidate terms. The main ways that addressed this task can be ranged in four main categories: (i) rule-based approaches, (ii) feature-based approaches, (iii) context-based approaches, and (iv) hybrid approaches. For this first TermEval shared task, we explore a feature-based approach, and a deep neural network multitask approach-BERT-that we fine-tune for term extraction. We show that BERT models (RoBERTa for English and CamemBERT for French) outperform other systems for French and English languages.
Fichier principal
Vignette du fichier
2020.computerm-1.13.pdf (275.66 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03477769 , version 1 (13-12-2021)

Identifiants

  • HAL Id : hal-03477769 , version 1

Citer

Amir Hazem, Mérième Bouhandi, Florian Boudin, Béatrice Daille. TermEval 2020: TALN-LS2N System for Automatic Term Extraction. International Workshop on Computational Terminology (COMPUTERM), May 2020, Marseille, France. ⟨hal-03477769⟩
91 Consultations
81 Téléchargements

Partager

Gmail Facebook X LinkedIn More