Multilingual Extraction of Semantic Indexes

Abstract : This article deals with multilingual document indexing. We propose an indexing method based on several stages. First of all the most important terms of the document are extracted using general characteristics of languages and statistical methods. Thus, term extraction stages can be applied to any document whatever the document language is. Secondly, our indexing method uses a multilingual ontology in order to find the most relevant concepts representing the document content. Our method can be applied to a multilingual corpus containing document written in different languages. This indexing procedure is part of a Multilingual Document System untitled SyDoM, which manages XML documents.
Document type :
Conference papers
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01563180
Contributor : Équipe Gestionnaire Des Publications Si Liris <>
Submitted on : Monday, July 17, 2017 - 1:30:20 PM
Last modification on : Friday, January 11, 2019 - 4:35:40 PM

Identifiers

  • HAL Id : hal-01563180, version 1

Citation

Catherine Roussey, Sylvie Calabretto, Farah Harrathi. Multilingual Extraction of Semantic Indexes. Proceedings of the 2007 international workshop on Semantically aware document processing and indexing 2007, May 2007, Montpellier, France., France. pp.1-8. ⟨hal-01563180⟩

Share

Metrics

Record views

76