Unsupervised and semi-supervised morphological analysis for Information Retrieval in the biomedical domain

Vincent Claveau 1
1 TEXMEX - Multimedia content-based indexing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : In the biomedical field, the key to access information is the use of specialized terms. However, in most of Indo-European languages, these terms are complex morphological structures. The aim of the presented work is to identify the various meaningful components of these terms and use this analysis to improve biomedical Information Retrieval. We present an approach combining an automatic alignment using a pivot language, and an analogical learning that allows an accurate morphological analysis of terms. These morphological analysis are used to improve the indexing of medical documents. The experiments reported in this paper show the validity of this approach with a 10% improvement in MAP over a standard IR system.
Complete list of metadatas

Cited literature [30 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-00760114
Contributor : Vincent Claveau <>
Submitted on : Monday, December 3, 2012 - 2:54:03 PM
Last modification on : Friday, November 16, 2018 - 1:24:48 AM
Long-term archiving on : Saturday, December 17, 2016 - 7:20:18 PM

File

Claveau_Kijak_Coling2012.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00760114, version 1

Citation

Vincent Claveau. Unsupervised and semi-supervised morphological analysis for Information Retrieval in the biomedical domain. COLING - 24th International Conference on Computational Linguistics, Dec 2012, Mumbai, India. ⟨hal-00760114⟩

Share

Metrics

Record views

386

Files downloads

229