Skip to Main content Skip to Navigation
Conference papers

Learning language-independent sentence representations for multi-lingual, multi-document summarization

Abstract : This paper presents an extension of a denoising auto-encoder to learn language-independent representations of parallel multilingual sentences. Each sentence from one language is represented using language dependent distributed representations. The input of the auto-encoder is then constituted of a concatenation of the distributed representations corresponding to the vector representations of translations of the same sentence in different languages. We show the effectiveness of the learnt representation for extractive multidocument summarization, using a simple cosine measure that estimates the similarity between vectors of sentences found by the auto-encoder and the vector representation of a generic query represented in the same learnt space. The top ranked sentences are then selected to generate the summary. Compared to other classical sentence representations, we demonstrate the effectiveness of our approach on the TAC 2011 MultiLing collection and show that learning language-independent representations of sentences that are translations one from another helps to significantly improve performance with respect to Rouge-SU4 measure.
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01236592
Contributor : Massih-Reza Amini <>
Submitted on : Tuesday, December 1, 2015 - 10:28:16 PM
Last modification on : Monday, April 20, 2020 - 11:24:01 AM

Identifiers

  • HAL Id : hal-01236592, version 1

Collections

CNRS | LIG | UGA

Citation

Georgios Balikas, Massih-Reza Amini. Learning language-independent sentence representations for multi-lingual, multi-document summarization. Conférence sur l'Apprentissage Automatique (CAp 2015), Jul 2015, Lille, France. ⟨hal-01236592⟩

Share

Metrics

Record views

152