1076 articles – 553 references  [version française]
HAL: hal-00765545, version 1

See short view  BibTeX,EndNote,...
Finding optimal probabilistic generators for XML collections
Abiteboul S., Amsterdamer Y., Deutch D., Milo T., Senellart P.
ICDT, Berlin : Allemagne (2012) - http://hal.inria.fr/hal-00765545
Peer-reviewed conferences/proceedings
Computer Science/Databases
Finding optimal probabilistic generators for XML collections
Serge Abiteboul 1, Yael Amsterdamer 2, Daniel Deutch 1, Tova Milo 3, Pierre Senellart 4
1:  Laboratoire Spécification et Vérification [Cachan] (LSV)
http://www.lsv.ens-cachan.fr/
CNRS : UMR8643 – INRIA – École normale supérieure de Cachan - ENS Cachan
Bâtiment d'Alembert 61 Avenue du Président Wilson 94235 CACHAN CEDEX
France
2:  Télécom ParisTech
http://www.telecom-paristech.fr/
Institut Mines-Télécom
46 rue Barrault 75634 Paris Cedex 13
France
3:  Tel Aviv University
http://www.tau.ac.il/index-eng.html
Israel
4:  Institut Télécom - Télécom ParisTech
http://www.telecom-paristech.fr/
Télécom ParisTech
37-39 rue Dareau, 75014 Paris
France
We study the problem of, given a corpus of XML documents and its schema, finding an optimal (generative) probabilistic model, where optimality here means maximizing the like- lihood of the particular corpus to be generated. Focusing first on the structure of documents, we present an efficient algorithm for finding the best generative probabilistic model, in the absence of constraints. We further study the problem in the presence of integrity constraints, namely key, inclusion, and domain constraints. We study in this case two different kinds of generators. First, we consider a continuation-test generator that performs, while generating documents, tests of schema satisfiability; these tests prevent from generating a document violating the constraints but, as we will see, they are computationally expensive. We also study a restart generator that may generate an invalid document and, when this is the case, restarts and tries again. Finally, we consider the injection of data values into the structure, to obtain a full XML document. We study different approaches for generating these values.
English

2012
international
ICDT
Berlin
Germany
2012-03-26
127-139

Acronyme WEBDAM
Attached file list to this document: 
PDF
abiteboul2012finding.pdf(494.7 KB)