Learning Based Summarisation of XML Documents - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Information Retrieval Journal Année : 2007

Learning Based Summarisation of XML Documents

Anastasios Tombros
  • Fonction : Auteur
Nicolas Usunier
  • Fonction : Auteur
  • PersonId : 933831
Mounia Lalmas
  • Fonction : Auteur

Résumé

Documents formatted in eXtensible Markup Language (XML) are available in collections of various document types. In this paper, we present an approach for the summarisation of XML documents. The novelty of this approach lies in that it is based on features not only from the content of documents, but also from their logical structure. We follow a machine learning, sentence extraction-based summarisation technique. To find which features are more effective for producing summaries, this approach views sentence extraction as an ordering task. We evaluated our summarisation model using the INEX and SUMMAC datasets. The results demonstrate that the inclusion of features from the logical structure of documents increases the effectiveness of the summariser, and that the learnable system is also effective and well-suited to the task of summarisation in the context of XML documents. Our approach is generic, and is therefore applicable, apart from entire documents, to elements of varying granularity within the XML tree. We view these results as a step towards the intelligent summarisation of XML documents.

Dates et versions

hal-01170740 , version 1 (02-07-2015)

Identifiants

Citer

Massih-Reza Amini, Anastasios Tombros, Nicolas Usunier, Mounia Lalmas. Learning Based Summarisation of XML Documents. Information Retrieval Journal, 2007, 10 (3), pp.233-255. ⟨10.1007/s10791-006-9017-1⟩. ⟨hal-01170740⟩
46 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More