Parallel Corpora for the Biomedical Domain - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2018

Parallel Corpora for the Biomedical Domain

Antonio Jimeno Yepes
  • Fonction : Auteur
  • PersonId : 1034914
L Neves
  • Fonction : Auteur
Karin Verspoor
  • Fonction : Auteur

Résumé

A vast amount of biomedical information is available in the form of scientific literature and government-authored patient information documents. While English is the most widely used language in many of these sources, there is a need to provide access to health information in languages other than English. Parallel corpora can be leveraged to implement cross-lingual information retrieval or machine translation tools. Herein, we review the extent of parallel corpus coverage in the biomedical domain. Specifically, we perform a scoping review of existing resources and we describe the recent development of new datasets for scientific literature (the EDP dataset and an extension of the Scielo corpus) and clinical trials (the ReBEC corpus). These corpora are currently being used in the biomedical task in the Conference on Machine Translation (WMT’16 and WMT’17), which illustrates their potential for improving and evaluating biomedical machine translation systems. Furthermore, we suggest additional applications for multilingual natural language processing using these resources, and plan to extend resource coverage to additional text genres and language pairs.
Fichier principal
Vignette du fichier
wmtLREC2018_final.pdf (144.48 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01847303 , version 1 (23-07-2018)

Identifiants

  • HAL Id : hal-01847303 , version 1

Citer

Aurélie Névéol, Antonio Jimeno Yepes, L Neves, Karin Verspoor. Parallel Corpora for the Biomedical Domain. International Conference on Language Resources and Evaluation, ELRA, May 2018, Miyazaki, Japan. ⟨hal-01847303⟩
152 Consultations
297 Téléchargements

Partager

Gmail Facebook X LinkedIn More