Skip to Main content Skip to Navigation
Conference papers

DTD based costs for Tree-Edit distance in Structured Information Retrieval

Abstract : In this paper we present a Structured Information Retrieval (SIR) model based on graph matching. Our approach combines content propagation, which handles sibling relationships, with a document-query structure matching process. The latter is based on Tree-Edit Distance (TED) which is the minimum set of insert, delete, and replace operations to turn one tree to another. To our knowledge this algorithm has never been used in ad-hoc SIR. As the effectiveness of TED relies both on the input tree and the edit costs, we first present a focused subtree extraction technique which selects the most representative elements of the document w.r.t the query. We then describe our TED costs setting based on the Document Type Definition (DTD). Finally we discuss our results according to the type of the collection (data-oriented or text-oriented). Experiments are conducted on two INEX test sets: the 2010 Datacentric collection and the 2005 Ad-hoc one.
Complete list of metadata

Cited literature [26 references]  Display  Hide  Download
Contributor : Open Archive Toulouse Archive Ouverte (OATAO) Connect in order to contact the contributor
Submitted on : Monday, February 8, 2016 - 10:35:14 AM
Last modification on : Wednesday, June 1, 2022 - 4:10:27 AM
Long-term archiving on: : Friday, November 11, 2016 - 8:32:19 PM


Files produced by the author(s)


  • HAL Id : hal-01264568, version 1
  • OATAO : 12346


Cyril Laitang, Karen Pinel-Sauvagnat, Mohand Boughanem. DTD based costs for Tree-Edit distance in Structured Information Retrieval. 35th European Conference on Information Retrieval (ECIR 2013), Mar 2013, Moscou, Russia. pp.158-179. ⟨hal-01264568⟩



Record views


Files downloads