Unsupervised Word Discovery Using Attentional Encoder-Decoder Models

Marcely Zanon Boito; Laurent Besacier; Aline Villavicencio

Communication Dans Un Congrès Année : 2017

Unsupervised Word Discovery Using Attentional Encoder-Decoder Models

(1, 2, 3) , (1) , (2)

1
2
3

Marcely Zanon Boito

Fonction : Auteur
PersonId : 1074923

Laboratoire d'Informatique de Grenoble

Instituto de Informática da UFRGS

Université Grenoble Alpes - UFR Informatique et Mathématiques Appliquées

Laurent Besacier

Fonction : Auteur
PersonId : 1521
IdHAL : laurent-besacier
ORCID : 0000-0001-7411-9125
IdRef : 079377017

Laboratoire d'Informatique de Grenoble

Aline Villavicencio

Fonction : Auteur

Instituto de Informática da UFRGS

Résumé

Attention-based sequence-to-sequence neural machine translation systems have been shown to jointly align and translate source sentences into target sentences. In this project we use unsegmented symbol sequences (characters and phonemes) as source, aiming to explore the soft-alignment probability matrices generated during training and to evaluate if these soft-alignments allow us to discover latent lexicon representations. If successful, such approach could be useful for documenting unwritten and/or endangered languages. However, for this to be feasible, attention models should be robust to low-resource scenarios, of several thousand of sentences only. We use a parallel corpus between the endangered language Mboshi and French, as well as a larger and more controlled English-French parallel corpus. Our goal is to explore different representation levels and study their impact, together with the impact of different data set sizes, in the quality of the generated soft-alignment probability matrices.

Domaines

Informatique et langage [cs.CL] Informatique [cs]

Fichier principal

44_Paper.pdf (116.35 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Marcely Zanon Boito : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02895851

Soumis le : vendredi 10 juillet 2020-10:41:00

Dernière modification le : jeudi 4 avril 2024-21:18:15

Archivage à long terme le : lundi 30 novembre 2020-19:26:42

Dates et versions

hal-02895851 , version 1 (10-07-2020)

Identifiants

HAL Id : hal-02895851 , version 1

Citer

Marcely Zanon Boito, Laurent Besacier, Aline Villavicencio. Unsupervised Word Discovery Using Attentional Encoder-Decoder Models. WiNLP workshop, ACL 2017, Jul 2017, Vancouver, Canada. ⟨hal-02895851⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS LIG POLYTECH-GRENOBLE LIG_SIDCH

25 Consultations

16 Téléchargements

Unsupervised Word Discovery Using Attentional Encoder-Decoder Models

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager