On the Midpoint of a Set of XML Documents.

Alberto Abellò; Xavier de Palol; Mohand-Said Hacid

doi:10.1007/11546924_43

Article Dans Une Revue Lecture Notes in Computer Science Année : 2005

On the Midpoint of a Set of XML Documents.

(1) , (1) , (2)

1
2

Alberto Abellò

Fonction : Auteur

Universitat Politècnica de Catalunya = Université polytechnique de Catalogne [Barcelona]

Xavier de Palol

Fonction : Auteur

Universitat Politècnica de Catalunya = Université polytechnique de Catalogne [Barcelona]

Mohand-Said Hacid

Fonction : Auteur
PersonId : 7283
IdHAL : mohand-said-hacid
IdRef : 070848440

Laboratoire d'InfoRmatique en Image et Systèmes d'information

Résumé

The WWW contains a huge amount of documents. Some of them share the subject, but are generated by different people or even organizations. To guarantee the interchange of such documents, we can use XML, which allows to share documents that do not have the same structure. However, it makes difficult to understand the core of such heterogeneous documents (in general, schema is not available). In this paper, we offer a characterization and algorithm to obtain the midpoint (in terms of a resemblance function) of a set of semi-structured, heterogeneous documents without optional elements. The trivial case of midpoint would be the common elements to all documents. Nevertheless, in cases with several heterogeneous documents this may result in an empty set. Thus, we consider that those elements present in a given amount of documents belong to the midpoint. A exact schema could always be found generating optional elements. However, the exact schema of the whole set may result in overspecialization (lots of optional elements), which would make it useless.

Domaines

Informatique [cs]

Équipe gestionnaire des publications SI LIRIS : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01586608

Soumis le : mercredi 13 septembre 2017-09:21:47

Dernière modification le : mardi 9 avril 2024-08:56:04

Dates et versions

hal-01586608 , version 1 (13-09-2017)

Identifiants

HAL Id : hal-01586608 , version 1
DOI : 10.1007/11546924_43

Citer

Alberto Abellò, Xavier de Palol, Mohand-Said Hacid. On the Midpoint of a Set of XML Documents.. Lecture Notes in Computer Science, 2005, 3588, pp.441-450. ⟨10.1007/11546924_43⟩. ⟨hal-01586608⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS UNIV-LYON1 UNIV-LYON2 INSA-LYON EC-LYON LIRIS LABEXIMU INSA-GROUPE UDL

62 Consultations

0 Téléchargements

On the Midpoint of a Set of XML Documents.

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager