On using a quantum physics formalism for multidocument summarization

Benjamin Piwowarski 1 Massih-Reza Amini 2 Mounia Lalmas
1 BD - Bases de Données
LIP6 - Laboratoire d'Informatique de Paris 6
2 MALIRE - Machine Learning and Information Retrieval
LIP6 - Laboratoire d'Informatique de Paris 6
Abstract : Multidocument summarization (MDS) aims for each given query to extract compressed and relevant information with respect to the different query-related themes present in a set of documents. Many approaches operate in two steps. Themes are first identified from the set, and then a summary is formed by extracting salient sentences within the different documents of each of the identified themes. Among these approaches, latent semantic analysis (LSA) based approaches rely on spectral decomposition techniques to identify the themes. In this article, we propose a major extension of these techniques that relies on the quantum information access (QIA) framework. The latter is a framework developed for modeling information access based on the probabilistic formalism of quantum physics. The QIA framework not only points out the limitations of the current LSA-based approaches, but motivates a new principled criterium to tackle multidocument summarization that addresses these limitations. As a byproduct, it also provides a way to enhance the LSA-based approaches. Extensive experiments on the DUC 2005, 2006 and 2007 datasets show that the proposed approach consistently improves over both the LSA-based approaches and the systems that competed in the yearly DUC competitions. This demonstrates the potential impact of quantum-inspired approaches to information access in general, and of the QIA framework in particular.
Type de document :
Article dans une revue
Journal of the American Society for Information Science and Technology, Association for Information Science and Technology (ASIS&T), 2012, 63 (5), pp.865-888. 〈10.1002/asi.21713〉
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01172685
Contributeur : Lip6 Publications <>
Soumis le : mardi 7 juillet 2015 - 16:33:53
Dernière modification le : samedi 1 décembre 2018 - 01:24:56

Lien texte intégral

Identifiants

Collections

Citation

Benjamin Piwowarski, Massih-Reza Amini, Mounia Lalmas. On using a quantum physics formalism for multidocument summarization. Journal of the American Society for Information Science and Technology, Association for Information Science and Technology (ASIS&T), 2012, 63 (5), pp.865-888. 〈10.1002/asi.21713〉. 〈hal-01172685〉

Partager

Métriques

Consultations de la notice

92