Probabilistic Model for Structured Document Mapping - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2007

Probabilistic Model for Structured Document Mapping

Résumé

We address the problem of learning automatically to map heterogeneous semi-structured documents onto a mediated target XML schema. We adopt a machine learning approach where the mapping between input and target documents is learned from a training corpus of documents. We first introduce a general stochastic model of semi structured documents generation and transformation. This model relies on the concept of meta-document which is a latent variable providing a link between input and target documents. It allows us to learn the correspondences when the input documents are expressed in a large variety of schemas. We then detail an instance of the general model for the particular task of HTML to XML conversion. This instance is tested on three different corpora using two different inference methods: a dynamic programming method and an approximate LaSO-based method.

Dates et versions

hal-01336148 , version 1 (22-06-2016)

Identifiants

Citer

Guillaume Wisniewski, Francis Maes, Ludovic Denoyer, Patrick Gallinari. Probabilistic Model for Structured Document Mapping. 5th International Conference on Machine Learning and Data Mining for Pattern Recognition (MLDM'07'), Jul 2007, Leizig, Germany. pp.854-867, ⟨10.1007/978-3-540-73499-4_64⟩. ⟨hal-01336148⟩
60 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More