Using belief networks and Fisher kernels for structured document classification

Ludovic Denoyer 1 Patrick Gallinari 1
1 APA - Apprentissage et Acquisition des connaissances
LIP6 - Laboratoire d'Informatique de Paris 6
Abstract : We consider the classification of structured (e.g. XML) textual documents. We first propose a generative model based on Belief Networks which allows us to simultaneously take into account structure and content information. We then show how this model can be extended into a more efficient classifier using the Fisher kernel method. In both cases model parameters are learned from a labelled training set of representative documents. We present experiments on two collections of structured documents: WebKB which has become a reference corpus for HTML page classification and the new INEX corpus which has been developed for the evaluation of XML information retrieval systems.
Complete list of metadatas
Contributor : Ludovic Denoyer <>
Submitted on : Tuesday, August 30, 2016 - 10:17:34 AM
Last modification on : Thursday, March 21, 2019 - 2:18:56 PM

Links full text



Ludovic Denoyer, Patrick Gallinari. Using belief networks and Fisher kernels for structured document classification. PKDD 2003 - 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, Sep 2003, Cavtat-Dubrovnik, Croatia. pp.120-131, ⟨10.1007/978-3-540-39804-2_13⟩. ⟨hal-01357596⟩



Record views