G. Adda, S. Stücker, M. Adda-decker, O. Ambouroue, L. Besacier et al., Breaking the Unwritten Language Barrier: The BULB Project, Proceedings of SLTU (Spoken Language Technologies for Under- Resourced Languages), 2016.
DOI : 10.1016/j.procs.2016.04.023

URL : https://hal.archives-ouvertes.fr/halshs-01428027

A. Anastasopoulos and D. Chiang, A case study on using speech-to-translation alignments for language documentation. arXiv preprint, 2017.

S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak et al., Dbpedia: A nucleus for a web of open data. The semantic web, pp.722-735, 2007.

D. Bahdanau, K. Cho, and Y. Bengio, Neural machine translation by jointly learning to align and translate . arXiv preprint, 2014.

A. Bérard, O. Pietquin, C. Servan, and L. Besacier, Listen and translate: A proof of concept for endto-end speech-to-text translation, NIPS workshop on End-to-end Learning for Speech and Audio Processing, 2016.

A. Bérard, L. Besacier, A. C. Kocabiyikoglu, and O. Pietquin, End-to-end automatic speech translation of audiobooks, Accepted to Acoustics, Speech and Signal Processing (ICASSP) IEEE International Conference on Acoustics, Speech and Signal Processing, 2018.

S. Bird, Nltk: the natural language toolkit, Proceedings of the COLING/ACL on Interactive presentation sessions, pp.69-72, 2006.

D. Blachon, E. Gauthier, L. Besacier, G. Kouarata, M. Adda-decker et al., Parallel Speech Collection for Under-resourced Language Studies Using the Lig-Aikuma Mobile Device App, Proceedings of SLTU (Spoken Language Technologies for Under- Resourced Languages), 2016.
DOI : 10.1016/j.procs.2016.04.030

URL : https://hal.archives-ouvertes.fr/hal-01350065

C. Federmann and W. D. Lewis, Microsoft speech language translation (mslt) corpus: The iwslt 2016 release for english, french and german, 2016.

J. Ferrero, F. Agnes, L. Besacier, and D. Schwab, A multilingual, multi-style and multi-granularity dataset for cross-language textual similarity detection, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01303135

N. Fraser, google-diff-match-patch-diff, match and patch libraries for plain text, 2012.

E. Matusov, G. Leusch, O. Bender, and H. Ney, Evaluating machine translation output with automatic sentence segmentation, International Workshop on Spoken Language Translation (IWSLT) 2005, 2005.

V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, Librispeech: An ASR corpus based on public domain audio books, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.5206-5210, 2015.
DOI : 10.1109/ICASSP.2015.7178964

URL : http://www.clsp.jhu.edu/%7Eguoguo/papers/icassp2015_librispeech.pdf

M. Post, G. Kumar, A. Lopez, D. Karakos, C. Callison-burch et al., Improved speechto-text translation with the fisher and callhome spanishenglish speech translation corpus, 2013.

D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek et al., The kaldi speech recognition toolkit, IEEE 2011 workshop on automatic speech recognition and understanding, p.192584, 2011.

D. Varga, P. Halácsy, A. Kornai, V. Nagy, L. Németh et al., Parallel corpora for medium density languages. Amsterdam Studies in the Theory and History of Linguistic Science Series 4, p.247, 2007.
DOI : 10.1075/cilt.292.32var

URL : http://eprints.sztaki.hu/7902/1/Kornai_1762382_ny.pdf

R. J. Weiss, J. Chorowski, N. Jaitly, Y. Wu, C. et al., Sequence-to-sequence models can directly transcribe foreign speech. arXiv preprint, 2017.
DOI : 10.21437/interspeech.2017-503

URL : http://arxiv.org/pdf/1703.08581