Unsupervised Word Discovery Using Attentional Encoder-Decoder Models - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2017

Unsupervised Word Discovery Using Attentional Encoder-Decoder Models

Résumé

Attention-based sequence-to-sequence neural machine translation systems have been shown to jointly align and translate source sentences into target sentences. In this project we use unsegmented symbol sequences (characters and phonemes) as source, aiming to explore the soft-alignment probability matrices generated during training and to evaluate if these soft-alignments allow us to discover latent lexicon representations. If successful, such approach could be useful for documenting unwritten and/or endangered languages. However, for this to be feasible, attention models should be robust to low-resource scenarios, of several thousand of sentences only. We use a parallel corpus between the endangered language Mboshi and French, as well as a larger and more controlled English-French parallel corpus. Our goal is to explore different representation levels and study their impact, together with the impact of different data set sizes, in the quality of the generated soft-alignment probability matrices.
Fichier principal
Vignette du fichier
44_Paper.pdf (116.35 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02895851 , version 1 (10-07-2020)

Identifiants

  • HAL Id : hal-02895851 , version 1

Citer

Marcely Zanon Boito, Laurent Besacier, Aline Villavicencio. Unsupervised Word Discovery Using Attentional Encoder-Decoder Models. WiNLP workshop, ACL 2017, Jul 2017, Vancouver, Canada. ⟨hal-02895851⟩
25 Consultations
16 Téléchargements

Partager

Gmail Facebook X LinkedIn More