D. Bahdanau, K. Cho, and Y. Bengio, Neural machine translation by jointly learning to align and translate, 2014.

M. Elbayad, L. Besacier, and J. Verbeek, Pervasive attention: 2d convolutional neural networks for sequence-to-sequence prediction, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01851612

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones et al., Attention is all you need, Advances in Neural Information Processing Systems, pp.5998-6008, 2017.

J. Gehring, M. Auli, D. Grangier, D. Yarats, and Y. N. Dauphin, Convolutional sequence to sequence learning, Proceedings of the 34th International Conference on Machine Learning, vol.70, pp.1243-1252, 2017.

P. Godard, M. Boito, L. Ondel, A. Berard, F. Yvon et al., Unsupervised word segmentation from speech with attention, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01818092

I. Sutskever, O. Vinyals, and Q. V. Le, Sequence to sequence learning with neural networks, Advances in Neural Information Processing Systems, vol.27, pp.3104-3112, 2014.

G. Adda, S. Stüker, M. Adda-decker, O. Ambouroue, L. Besacier et al., Breaking the unwritten language barrier: The BULB project, Procedia Computer Science, vol.81, pp.8-14, 2016.
URL : https://hal.archives-ouvertes.fr/halshs-01428027

A. Anastasopoulos and D. Chiang, A case study on using speechto-translation alignments for language documentation, 2017.

L. Besacier, B. Zhou, and Y. Gao, Towards speech translation of non written languages, Spoken Language Technology Workshop, pp.222-225, 2006.

C. Lignos and C. Yang, Recession segmentation: simpler online word segmentation using limited resources, Proceedings of the fourteenth conference on computational natural language learning, pp.88-97, 2010.

C. Bartels, W. Wang, V. Mitra, C. Richey, A. Kathol et al., Toward human-assisted lexical unit discovery without text resources, Spoken Language Technology Workshop (SLT), pp.64-70, 2016.

P. K. Austin and J. Sallabank, The Cambridge handbook of endangered languages, 2011.

K. Song, T. Xu, F. Peng, and J. Lu, Hybrid self-attention network for machine translation, 2018.

J. Li, Z. Tu, B. Yang, M. R. Lyu, and T. Zhang, Multihead attention with disagreement regularization, 2018.

E. Voita, D. Talbot, F. Moiseev, R. Sennrich, and I. Titov, Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned, 2019.

T. Zenkel, J. Wuebker, and J. Denero, Adding interpretable attention to neural translation models improves word alignment, 2019.

L. Ondel, L. Burget, and J. ?ernock?, Variational inference for acoustic unit discovery, Procedia Computer Science, vol.81, pp.80-86, 2016.

A. C. Kocabiyikoglu, L. Besacier, and O. Kraif, Augmenting librispeech with french translations: A multimodal corpus for direct speech translation evaluation, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01709568

P. Godard, G. Adda, M. Adda-decker, J. Benjumea, L. Besacier et al., A very low resource language speech corpus for computational language documentation experiments, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01807093

V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, Librispeech: An ASR corpus based on public domain audio books, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, vol.2015, pp.5206-5210, 2015.

Z. Lin, M. Feng, C. Santos, M. Yu, B. Xiang et al., A structured self-attentive sentence embedding, 2017.

A. Bérard, O. Pietquin, C. Servan, and L. Besacier, Listen and translate: A proof of concept for end-to-end speech-to-text translation, 2016.

M. Ott, S. Edunov, A. Baevski, A. Fan, S. Gross et al., fairseq: A fast, extensible toolkit for sequence modeling, 2019.

P. Godard, Unsupervised word discovery for computational language documentation, 2019.

S. Goldwater, T. L. Griffiths, and M. Johnson, A Bayesian framework for word segmentation: Exploring the effects of context, Cognition, vol.112, issue.1, pp.21-54, 2009.

A. Fourtassi, B. Börschinger, M. Johnson, and E. Dupoux, Why is english so easy to segment, Proceedings of the Fourth Annual Workshop on Cognitive Modeling and Computational Linguistics (CMCL), pp.1-10, 2013.

A. Rialland, M. E. Aborobongui, M. Adda-decker, and L. Lamel, Dropping of the class-prefix consonant, vowel elision and automatic phonological mining in embosi (bantu c 25), Selected Proceedings of the 44th Annual Conference on African Linguistics, pp.7-10, 2015.
URL : https://hal.archives-ouvertes.fr/halshs-01251202