Detecting Irrelevant subtrees to improve probabilistic learning from tree-structured data - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Fundamenta Informaticae Année : 2005

Detecting Irrelevant subtrees to improve probabilistic learning from tree-structured data

Résumé

In front of the large increase of the available amount of structured data (such as XML documents), many algorithms have emerged for dealing with tree-structured data. In this article, we present a probabilistic approach which aims at a posteriori pruning noisy or irrelevant subtrees in a set of trees. The originality of this approach, in comparison with classic data reduction techniques, comes from the fact that only a part of a tree (i.e. a subtree) can be deleted, rather than the whole tree itself. Our method is based on the use of confidence intervals, on a partition of subtrees, computed according to a given probability distribution. We propose an original approach to assess these intervals on tree-structured data and we experimentally show its interest in the presence of noise.
Fichier principal
Vignette du fichier
hbs_fi05_draft.pdf (227.94 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-00369445 , version 1 (19-03-2009)

Identifiants

  • HAL Id : hal-00369445 , version 1

Citer

Amaury Habrard, Marc Bernard, Marc Sebban. Detecting Irrelevant subtrees to improve probabilistic learning from tree-structured data. Fundamenta Informaticae, 2005, 66 (1,2), pp.103-130. ⟨hal-00369445⟩
120 Consultations
364 Téléchargements

Partager

Gmail Facebook X LinkedIn More