Learning language-independent sentence representations for multi-lingual, multi-document summarization

Abstract : This paper presents an extension of a denoising auto-encoder to learn language-independent representations of parallel multilingual sentences. Each sentence from one language is represented using language dependent distributed representations. The input of the auto-encoder is then constituted of a concatenation of the distributed representations corresponding to the vector representations of translations of the same sentence in different languages. We show the effectiveness of the learnt representation for extractive multidocument summarization, using a simple cosine measure that estimates the similarity between vectors of sentences found by the auto-encoder and the vector representation of a generic query represented in the same learnt space. The top ranked sentences are then selected to generate the summary. Compared to other classical sentence representations, we demonstrate the effectiveness of our approach on the TAC 2011 MultiLing collection and show that learning language-independent representations of sentences that are translations one from another helps to significantly improve performance with respect to Rouge-SU4 measure.
Type de document :
Communication dans un congrès
Conférence sur l'Apprentissage Automatique (CAp 2015), Jul 2015, Lille, France. 〈http://cap2015.sciencesconf.org/〉
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01236592
Contributeur : Massih-Reza Amini <>
Soumis le : mardi 1 décembre 2015 - 22:28:16
Dernière modification le : mercredi 29 novembre 2017 - 15:25:09

Identifiants

  • HAL Id : hal-01236592, version 1

Collections

Citation

Georgios Balikas, Massih-Reza Amini. Learning language-independent sentence representations for multi-lingual, multi-document summarization. Conférence sur l'Apprentissage Automatique (CAp 2015), Jul 2015, Lille, France. 〈http://cap2015.sciencesconf.org/〉. 〈hal-01236592〉

Partager

Métriques

Consultations de la notice

93