Detecting Irrelevant subtrees to improve probabilistic learning from tree-structured data

Abstract : In front of the large increase of the available amount of structured data (such as XML documents), many algorithms have emerged for dealing with tree-structured data. In this article, we present a probabilistic approach which aims at a posteriori pruning noisy or irrelevant subtrees in a set of trees. The originality of this approach, in comparison with classic data reduction techniques, comes from the fact that only a part of a tree (i.e. a subtree) can be deleted, rather than the whole tree itself. Our method is based on the use of confidence intervals, on a partition of subtrees, computed according to a given probability distribution. We propose an original approach to assess these intervals on tree-structured data and we experimentally show its interest in the presence of noise.
Document type :
Journal articles
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-00369445
Contributor : Marc Sebban <>
Submitted on : Thursday, March 19, 2009 - 6:42:45 PM
Last modification on : Wednesday, July 25, 2018 - 2:05:31 PM
Long-term archiving on : Tuesday, June 8, 2010 - 8:13:58 PM

Files

hbs_fi05_draft.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00369445, version 1

Citation

Amaury Habrard, Marc Bernard, Marc Sebban. Detecting Irrelevant subtrees to improve probabilistic learning from tree-structured data. Fundamenta Informaticae, Polskie Towarzystwo Matematyczne, 2005, 66 (1,2), pp.103-130. ⟨hal-00369445⟩

Share

Metrics

Record views

184

Files downloads

417