A belief networks-based generative model for structured documents. An application to the XML categorization

Ludovic Denoyer 1 Patrick Gallinari 1
1 APA - Apprentissage et Acquisition des connaissances
LIP6 - Laboratoire d'Informatique de Paris 6
Abstract : We present a generative Bayesian model for the modeling of structured (e.g. XML) documents. This model allows us to simultaneously take into account structure and content information. It is used here for classifying XML documents. We adopt a machine learning approach and the model parameters are learned from a labeled training set of representative documents. We discuss the role of structural information for classification and describe experiments on a small collection of class labeled structured documents. We also present preliminary results showing how this model could classify documents with DTDs not represented in the training set.
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01357594
Contributor : Ludovic Denoyer <>
Submitted on : Tuesday, August 30, 2016 - 10:17:33 AM
Last modification on : Thursday, March 21, 2019 - 2:19:21 PM

Links full text

Identifiers

Citation

Ludovic Denoyer, Patrick Gallinari. A belief networks-based generative model for structured documents. An application to the XML categorization. MLDM 2003 - Third International Conference on Machine Learning and Data Mining in Pattern Recognition, Jul 2003, Leipzig, Germany. pp.328-342, ⟨10.1007/3-540-45065-3_29⟩. ⟨hal-01357594⟩

Share

Metrics

Record views

152