Mining XML Documents

Abstract : XML documents are becoming ubiquitous because of their rich and flexible format that can be used for a variety of applications. Giving the increasing size of XML collections as information sources, mining techniques that traditionally exist for text collections or databases need to be adapted and new methods to be invented to exploit the particular structure of XML documents. Basically XML documents can be seen as trees, which are well known to be complex structures. This chapter describes various ways of using and simplifying this tree structure to model documents and support efficient mining algorithms. We focus on three mining tasks: classification and clustering which are standard for text collections; discovering of frequent tree structure which is especially important for heterogeneous collection. This chapter presents some recent approaches and algorithms to support these tasks together with experimental evaluation on a variety of large XML collections.
Complete list of metadatas

Cited literature [27 references]  Display  Hide  Download

https://hal.inria.fr/inria-00188899
Contributor : Anne-Marie Vercoustre <>
Submitted on : Monday, November 19, 2007 - 3:55:34 PM
Last modification on : Thursday, March 21, 2019 - 1:07:54 PM
Long-term archiving on : Monday, April 12, 2010 - 2:42:34 AM

File

XML-MiningChapter_final.pdf
Files produced by the author(s)

Identifiers

Citation

Laurent Candillier, Ludovic Denoyer, Patrick Gallinari, Marie-Christine Rousset, Alexandre Termier, et al.. Mining XML Documents. P. Poncelet, F. Masseglia, M. Teisseire. Data Mining Patterns: New Methods and Applications, Information Science Reference, pp.198-219, 2007, ⟨10.4018/978-1-59904-162-9.ch009⟩. ⟨inria-00188899⟩

Share

Metrics

Record views

573

Files downloads

399