Analyse morphologique en terminologie biomédicale par alignement et apprentissage non-supervisé

Vincent Claveau 1, * Ewa Kijak 1
* Corresponding author
1 TEXMEX - Multimedia content-based indexing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : In the biomedical domain, many terms are neoclassical compounds (composed of several Greek or Latin roots). The study of their morphology is important for numerous applications since it makes it possible to structure them, retrieve them efficiently, translate them... In this paper, we propose an original yet fruitful approach to carry out this morphological analysis by relying on Japanese, more precisely on terms written in kanjis, as a pivot language. In order to do so, we have developed a specially crafted alignment algorithm. This alignment process of French terms with their kanji-based counterparts provides at the same time a decomposition of the French term into morphs, and a kanji label for each morph. Evaluated on a big dataset, our approach yields a precision greater than 70% and shows its the relevance compared with existing techniques. We also illustrate the validity of our reasoning through two direct applications of the produced alignments: translation of unknown terms and discovering of relationships between morphs for terminological structuring.
Document type :
Conference papers
Conférence Traitement automatique des langues naturelles, TALN'10, Jul 2010, Montréal, Québec, Canada. 2010, 〈http://www.iro.umontreal.ca/~felipe/TALN2010/Xml/Papers/all/taln2010_submission_83.pdf〉
Liste complète des métadonnées

https://hal.inria.fr/inria-00561086
Contributor : Patrick Gros <>
Submitted on : Monday, January 31, 2011 - 4:01:14 PM
Last modification on : Friday, November 16, 2018 - 1:21:51 AM

Identifiers

  • HAL Id : inria-00561086, version 1

Citation

Vincent Claveau, Ewa Kijak. Analyse morphologique en terminologie biomédicale par alignement et apprentissage non-supervisé. Conférence Traitement automatique des langues naturelles, TALN'10, Jul 2010, Montréal, Québec, Canada. 2010, 〈http://www.iro.umontreal.ca/~felipe/TALN2010/Xml/Papers/all/taln2010_submission_83.pdf〉. 〈inria-00561086〉

Share

Metrics

Record views

348