Service interruption on Monday 11 July from 12:30 to 13:00: all the sites of the CCSD (HAL, EpiSciences, SciencesConf, AureHAL) will be inaccessible (network hardware connection).
Skip to Main content Skip to Navigation
Book sections

Towards a Better Semantic Matching for Indexation Improvement of Error-Prone (Semi-)Structured XML Documents

Arnaud Renard 1 Sylvie Calabretto 1 Béatrice Rumpler 1 
1 DRIM - Distribution, Recherche d'Information et Mobilité
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information
Abstract : Documents containing errors in their textual content (which we will call noisy documents) are difficultly handled by Information Retrieval systems. The same observation is verified when it comes to (semi-)structured IR systems this paper deals with. However, the problem is even bigger when those systems rely on Semantics. In order to achieve that, they need an additional external semantic resource related to the documents collection. Then, ranking is made possible thanks to concepts comparisons allowed by similarity measures. Similarity measures assume that concepts related to the words have been identified without ambiguity. Nevertheless, this assumption can't be made in presence of noisy documents where words are potentially misspelled, resulting in a word having a different meaning or at least in a non-word. Semantic aware (semi-)structured IR systems lay on basic concept identification but they don’t care about spelling uncertainties. As this can degrade systems results, we suggest a way to detect and correct misspelled terms which can be used in documents pre-processing of IR systems. First results on small datasets seem promising.
Document type :
Book sections
Complete list of metadata
Contributor : Équipe gestionnaire des publications SI LIRIS Connect in order to contact the contributor
Submitted on : Friday, August 19, 2016 - 5:46:50 PM
Last modification on : Thursday, February 10, 2022 - 9:18:03 AM



Arnaud Renard, Sylvie Calabretto, Béatrice Rumpler. Towards a Better Semantic Matching for Indexation Improvement of Error-Prone (Semi-)Structured XML Documents. Joaquim Filipe, José Cordeiro. Lecture Notes in Business Information Processing (LNBIP), Springer-Verlag, pp.286-298, 2011, ⟨10.1007/978-3-642-22810-0_21⟩. ⟨hal-01354866⟩



Record views