A simple framework for contrastive learning of visual representations, 2020. ,
BERT: pre-training of deep bidirectional transformers for language understanding, CoRR, 2018. ,
Effectiveness of selfsupervised pre-training for speech recognition, 2019. ,
Learning robust and multilingual speech representations, 2020. ,
Generative pre-training for speech with autoregressive predictive coding, 2019. ,
An unsupervised autoregressive model for speech representation learning, CoRR, 2019. ,
Improved speech representations with multi-target autoregressive predictive coding, 2020. ,
wav2vec: Unsupervised Pre-Training for Speech Recognition, Proc. Interspeech, pp.3465-3469, 2019. ,
,
Libri-light: A benchmark for asr with limited or no supervision, 2019. ,
Unsupervised pretraining transfers well across languages, 2020. ,
Multi-task self-supervised learning for robust speech recognition, 2020. ,
, Ddsp: Differentiable digital signal processing, 2020.
Learning problem-agnostic speech representations from multiple self-supervised tasks, 2019. ,
Listen and translate: A proof of concept for end-to-end speech-to-text translation, NIPS Workshop on End-to-end Learning for Speech and Audio Processing, 2016. ,
Sequence-to-sequence models can directly transcribe foreign speech, Proc. of INTERSPEECH, 2017. ,
End-to-end automatic speech translation of audiobooks, CoRR, 2018. ,
Pre-training on high-resource speech recognition improves lowresource speech-to-text translation, CoRR, 2018. ,
Towards unsupervised speech-to-text translation, CoRR, 2018. ,
Direct speech-to-speech translation with a sequence-to-sequence model, CoRR, 2019. ,
MuST-C: a Multilingual Speech Translation Corpus, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol.1, pp.2012-2017, 2019. ,
Leveraging weakly supervised data to improve end-to-end speech-totext translation, CoRR, 2018. ,
Attentionpassing models for robust and data-efficient end-to-end speech translation, CoRR, 2019. ,
Lib-riSpeech: an ASR corpus based on public domain audio books, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.5206-5210, 2015. ,
How2: a large-scale dataset for multimodal language understanding, ViGIL Workshop, NeurIPS, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-02431947
ON-TRAC consortium end-to-end speech translation systems for the IWSLT 2019 shared task, Proc. of IWSLT, 2019. ,
URL : https://hal.archives-ouvertes.fr/hal-02352949
Very deep convolutional networks for large-scale image recognition, Proc. of ICLR, 2015. ,
Neural Machine Translation by Jointly Learning to Align and Translate, Proc. of ICLR, 2015. ,
ESPnet-ST: All-in-one speech translation toolkit, 2020. ,
The iwslt 2019 evaluation campaign, Proceedings of the 16th International Workshop on Spoken Language Translation, 2019. ,
DARPA TIMIT acoustic phonetic continuous speech corpus cdrom, 1993. ,
Towards accurate predictors of word quality for machine translation: Lessons learned on french -english and english -spanish systems, Data and Knowledge Engineering, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01147902
The Kaldi speech recognition toolkit, Tech. Rep, 2011. ,
X-vectors: Robust DNN embeddings for speaker recognition, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.5329-5333, 2018. ,
VoxCeleb: a largescale speaker identification dataset, pp.2616-2620, 2017. ,
Introducing the VoicePrivacy initiative, 2020. ,
URL : https://hal.archives-ouvertes.fr/hal-02562199