Learning language-independent sentence representations for multi-lingual, multi-document summarization

Georgios Balikas; Massih-Reza Amini

Communication Dans Un Congrès Année : 2015

Learning language-independent sentence representations for multi-lingual, multi-document summarization

(1) , (1)

Georgios Balikas

Fonction : Auteur

Analyse de données, Modélisation et Apprentissage automatique [Grenoble]

Massih-Reza Amini

Fonction : Auteur
PersonId : 747054
IdHAL : massih-reza-amini
ORCID : 0000-0001-9032-4233
IdRef : 132277042

Analyse de données, Modélisation et Apprentissage automatique [Grenoble]

Résumé

This paper presents an extension of a denoising auto-encoder to learn language-independent representations of parallel multilingual sentences. Each sentence from one language is represented using language dependent distributed representations. The input of the auto-encoder is then constituted of a concatenation of the distributed representations corresponding to the vector representations of translations of the same sentence in different languages. We show the effectiveness of the learnt representation for extractive multidocument summarization, using a simple cosine measure that estimates the similarity between vectors of sentences found by the auto-encoder and the vector representation of a generic query represented in the same learnt space. The top ranked sentences are then selected to generate the summary. Compared to other classical sentence representations, we demonstrate the effectiveness of our approach on the TAC 2011 MultiLing collection and show that learning language-independent representations of sentences that are translations one from another helps to significantly improve performance with respect to Rouge-SU4 measure.

Mots clés

Multidocument text summarization representation learning

Domaines

Apprentissage [cs.LG]

Massih-Reza Amini : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01236592

Soumis le : mardi 1 décembre 2015-22:28:16

Dernière modification le : jeudi 4 avril 2024-21:27:28

Dates et versions

hal-01236592 , version 1 (01-12-2015)

Identifiants

HAL Id : hal-01236592 , version 1

Citer

Georgios Balikas, Massih-Reza Amini. Learning language-independent sentence representations for multi-lingual, multi-document summarization. Conférence sur l'Apprentissage Automatique (CAp 2015), Jul 2015, Lille, France. ⟨hal-01236592⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS LIG LIG_SIDCH LIG_SIDCH_APTIKAL

98 Consultations

0 Téléchargements

Learning language-independent sentence representations for multi-lingual, multi-document summarization

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager