On Correcting XML Documents with Respect to a Schema

Abstract : We present an algorithm for the correction of an XML document with respect to schema constraints expressed as a document type definition. Given a well-formed XML document t seen as a tree, a schema S and a non-negative threshold th, the algorithm finds every tree t′ valid with respect to S such that the edit distance between t and t′ is no higher than th. The algorithm is based on a recursive exploration of the finite-state automata representing structural constraints imposed by the schema, as well as on the construction of an edit distance matrix storing edit sequences leading to correction trees. We prove the termination, correctness and completeness of the algorithm, as well as its exponential time complexity. We also perform experimental tests on real-life XML data showing the influence of various input parameters on the execution time and on the number of solutions found. The algorithm's implementation demonstrates polynomial rather than exponential behavior. It has been made public under the GNU LGPL v3 license. As we show in our in-depth discussion of the related work, this is the first full-fledged study of the document-to-schema correction problem.
Type de document :
Article dans une revue
The Computer Journal, Oxford University Press (UK), 2014, 57 (5), pp.639-674
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-00999637
Contributeur : Joshua Amavi <>
Soumis le : mardi 3 juin 2014 - 17:52:03
Dernière modification le : jeudi 7 février 2019 - 17:19:52

Identifiants

  • HAL Id : hal-00999637, version 1

Collections

Citation

Joshua Amavi, Béatrice Bouchou, Agata Savary. On Correcting XML Documents with Respect to a Schema. The Computer Journal, Oxford University Press (UK), 2014, 57 (5), pp.639-674. 〈hal-00999637〉

Partager

Métriques

Consultations de la notice

121