Skip to Main content Skip to Navigation
Conference papers

Unsupervised Word Segmentation from Speech with Attention

Abstract : We present a first attempt to perform attentional word segmen-tation directly from the speech signal, with the final goal to automatically identify lexical units in a low-resource, unwritten language (UL). Our methodology assumes a pairing between recordings in the UL with translations in a well-resourced language. It uses Acoustic Unit Discovery (AUD) to convert speech into a sequence of pseudo-phones that is segmented using neural soft-alignments produced by a neural machine translation model. Evaluation uses an actual Bantu UL, Mboshi; comparisons to monolingual and bilingual baselines illustrate the potential of attentional word segmentation for language documentation.
Document type :
Conference papers
Complete list of metadatas

Cited literature [33 references]  Display  Hide  Download
Contributor : Laurent Besacier <>
Submitted on : Monday, June 18, 2018 - 4:41:32 PM
Last modification on : Wednesday, September 16, 2020 - 5:51:21 PM
Long-term archiving on: : Wednesday, September 26, 2018 - 6:25:36 PM


Files produced by the author(s)


  • HAL Id : hal-01818092, version 1


Pierre Godard, Marcely Zanon Boito, Lucas Ondel, Alexandre Berard, François Yvon, et al.. Unsupervised Word Segmentation from Speech with Attention. Interspeech 2018, Sep 2018, Hyderabad, India. ⟨hal-01818092⟩



Record views


Files downloads