Parallel sentence alignment from biomedical comparable corpora - Archive ouverte HAL Accéder directement au contenu
Chapitre D'ouvrage Année : 2020

Parallel sentence alignment from biomedical comparable corpora

Rémi Cardon
  • Fonction : Auteur
  • PersonId : 184596
  • IdHAL : remi-cardon

Résumé

Parallel sentences provide semantically similar information which can vary on a given dimension, such as language or register. Parallel sentences with register variation (like expert and non-expert documents) can be exploited for the automatic text simplification. The aim of automatic text simplification is to better access and understand a given information. In the biomedical field, simplification may permit patients to understand medical and health texts. Yet, there is currently no such available resources. We propose to exploit comparable corpora which are distinguished by their registers (specialized and simplified versions) to detect and align parallel sentences. These corpora are in French and are related to the biomedical area. We treat this task as binary classification (alignment/non-alignment). Our results show that the method we present here can be used to automatically generate a corpus of parallel sentences from our comparable corpus.
Fichier principal
Vignette du fichier
cardon-MIE2020.pdf (118.53 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03095183 , version 1 (04-01-2021)

Identifiants

Citer

Rémi Cardon, Natalia Grabar. Parallel sentence alignment from biomedical comparable corpora. Studies in Health Technology and Informatics, 270, pp.362-366, 2020, ⟨10.3233/SHTI200183⟩. ⟨hal-03095183⟩
33 Consultations
33 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More