Detecting Irrelevant subtrees to improve probabilistic learning from tree-structured data

Amaury Habrard; Marc Bernard; Marc Sebban

Article Dans Une Revue Fundamenta Informaticae Année : 2005

Detecting Irrelevant subtrees to improve probabilistic learning from tree-structured data

(1) , (2) , (2)

1
2

Amaury Habrard

Fonction : Auteur
PersonId : 439
IdHAL : amaury-habrard
ORCID : 0000-0003-3038-9347
IdRef : 084103655

Laboratoire d'informatique Fondamentale de Marseille - UMR 6166

Marc Bernard

Fonction : Auteur

Laboratoire Hubert Curien

Marc Sebban

Fonction : Auteur
PersonId : 5203
IdHAL : marc-sebban
ORCID : 0000-0001-6851-169X
IdRef : 050802623

Laboratoire Hubert Curien

Résumé

In front of the large increase of the available amount of structured data (such as XML documents), many algorithms have emerged for dealing with tree-structured data. In this article, we present a probabilistic approach which aims at a posteriori pruning noisy or irrelevant subtrees in a set of trees. The originality of this approach, in comparison with classic data reduction techniques, comes from the fact that only a part of a tree (i.e. a subtree) can be deleted, rather than the whole tree itself. Our method is based on the use of confidence intervals, on a partition of subtrees, computed according to a given probability distribution. We propose an original approach to assess these intervals on tree-structured data and we experimentally show its interest in the presence of noise.

Mots clés

data reduction tree-structured data noisy data stochastic tree automata

Domaines

Apprentissage [cs.LG]

Fichier principal

hbs_fi05_draft.pdf (227.94 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Marc Sebban : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00369445

Soumis le : jeudi 19 mars 2009-18:42:45

Dernière modification le : jeudi 31 août 2023-12:14:35

Archivage à long terme le : mardi 8 juin 2010-20:13:58

Dates et versions

hal-00369445 , version 1 (19-03-2009)

Identifiants

HAL Id : hal-00369445 , version 1

Citer

Amaury Habrard, Marc Bernard, Marc Sebban. Detecting Irrelevant subtrees to improve probabilistic learning from tree-structured data. Fundamenta Informaticae, 2005, 66 (1,2), pp.103-130. ⟨hal-00369445⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-ST-ETIENNE IOGS LIF CNRS UNIV-AMU LAHC PARISTECH LIS-LAB UDL

120 Consultations

364 Téléchargements

Detecting Irrelevant subtrees to improve probabilistic learning from tree-structured data

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager