A. Hannun, Deep speech: Scaling up end-to-end speech recognition, 2014.

A. Graves and N. Jaitly, Towards end-to-end speech recognition with recurrent neural networks, International Conference on Machine Learning, pp.1764-1772, 2014.

. Amodei, Deep speech 2: End-to-end speech recognition in English and Mandarin, International conference on machine learning, pp.173-182, 2016.

Y. Miao, M. Gowayyed, and F. Metze, EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding, pp.167-174, 2015.

Y. Wang, R. Skerry-ryan, D. Stanton, and Y. Wu, Tacotron: Towards end-to-end speech synthesis, 2017.

A. Bérard, O. Pietquin, C. Servan, and L. Besacier, Listen and translate: A proof of concept for end-to-end speech-to-text translation, 2016.

G. Heigold, I. Moreno, S. Bengio, and N. Shazeer, End-to-end text-dependent speaker verification, ICASSP, 2016.

Y. Qian, R. Ubale, V. Ramanaryanan, P. Lange, D. Suendermannoeft et al., Exploring ASR-free end-toend modeling to improve spoken language understanding in a cloud-based dialog system, pp.569-576, 2017.

P. Haghani, A. Narayanan, M. Bacchiani, G. Chuang, N. Gaur et al., From audio to semantics: Approaches to end-to-end spoken language understanding, 2018.

D. Serdyuk, Y. Wang, C. Fuegen, A. Kumar, B. Liu et al., Towards end-to-end spoken language understanding, 2018.

S. Ghannay, A. Caubrière, Y. Estève, N. Camelin, E. Simonnet et al., End-to-end named entity and semantic concept extraction from speech, SLT, pp.692-699, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01987740

Y. Chen, R. Price, and S. Bangalore, Spoken language understanding without speech recognition, ICASSP, 2018.

I. Sutskever, O. Vinyals, and Q. V. Le, Sequence to sequence learning with neural networks, Advances in neural information processing systems, pp.3104-3112, 2014.

A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks, Proceedings of the 23rd international conference on Machine learning, pp.369-376, 2006.

D. Bahdanau, J. Chorowski, D. Serdyuk, P. Brakel, and Y. Bengio, End-to-end attention-based large vocabulary speech recognition, ICASSP, pp.4945-4949, 2016.

N. Tomashenko and Y. Estève, Evaluation of feature-space speaker adaptation for end-to-end acoustic models, LREC, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01728526

M. Delcroix, S. Watanabe, A. Ogawa, S. Karita, and T. Nakatani, Auxiliary feature based adaptation of end-to-end asr systems, 2018.

T. Ochiai, S. Watanabe, S. Katagiri, T. Hori, and J. Hershey, Speaker adaptation for multichannel end-to-end speech recognition, ICASSP, pp.6707-6711, 2018.

K. Li, J. Li, Y. Zhao, K. Kumar, and Y. Gong, Speaker adaptation for end-to-end CTC models, SLT, pp.542-549, 2018.

G. Saon, H. Soltau, D. Nahamoo, and M. Picheny, Speaker adaptation of neural network acoustic models using i-vectors, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, pp.55-59, 2013.

N. Dehak and P. J. Kenny, Front-end factor analysis for speaker verification, IEEE Transactions on Audio, Speech, and Language Processing, pp.788-798, 2011.

M. J. Gales, Maximum likelihood linear transformations for hmm-based speech recognition, pp.75-98, 1998.

J. Gauvain and C. Lee, Maximum a posteriori estimation for multivariate gaussian mixture observations of markov chains, IEEE transactions on speech and audio processing, 1994.

N. Tomashenko and Y. Khokhlov, Speaker adaptation of context dependent deep neural networks based on map-adaptation and gmm-derived feature processing, 2014.

M. Delcroix, K. Kinoshita, A. Ogawa, T. Yoshioka, D. T. Tran et al., Context adaptive neural network for rapid adaptation of deep cnn based acoustic models, Interspeech, pp.1573-1577, 2016.

S. Deena, R. W. Ng, P. Madhyashta, L. Specia, and T. Hain, Semi-supervised adaptation of rnnlms by fine-tuning with domain-specific auxiliary features, pp.2715-2719, 2017.

S. J. Pan and Q. Yang, A survey on transfer learning, IEEE Transactions on knowledge and data engineering, 2010.

A. Caubrière and N. Tomashenko, Curriculum-based transfer learning for an effective end-to-end spoken language understanding and domain portability, 2019.

Y. Esteve, T. Bazillon, J. Antoine, F. Béchet, and J. Farinas, The EPAC corpus: Manual and automatic annotations of conversational speech in french broadcast news, LREC, 2010.
URL : https://hal.archives-ouvertes.fr/hal-01433895

S. Galliano, G. Gravier, and L. Chaubard, The ESTER 2 evaluation campaign for the rich transcription of french radio broadcasts, Tenth Annual Conference of the International Speech Communication Association, 2009.

G. Gravier, G. Adda, N. Paulson, M. Carré, A. Giraudel et al., The ETAPE corpus for the evaluation of speechbased TV content processing in the french language, LREC, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00712591

A. Giraudel, M. Carré, V. Mapelli, J. Kahn, O. Galibert et al., The REPERE corpus: a multimodal corpus for person recognition, LREC, pp.1102-1107, 2012.

F. Bechet, B. Maza, N. Bigouroux, T. Bazillon, M. El-beze et al., DECODA: a call-centre humanhuman spoken conversation corpus, LREC, 2012.

L. Devillers, H. Maynard, S. Rosset, P. Paroubek, K. Mctait et al., The french MEDIA/EVALDA project: the evaluation of the understanding capability of spoken language dialogue systems, LREC, 2004.

F. Lefèvre, D. Mostefa, L. Besacier, Y. Estève, M. Quignard et al., Robustness and portability of spoken language understanding systems among languages and domains: the PortMedia project, pp.779-786

E. Simonnet, S. Ghannay, N. Camelin, and Y. Estève, Simulating asr errors for training SLU systems, LREC 2018, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01715923

H. Bonneau-maynard, C. Ayache, F. Bechet, A. Denis, A. Kuhn et al., Results of the French Evalda-Media evaluation campaign for literal understanding, LREC, 2006.
URL : https://hal.archives-ouvertes.fr/hal-01160167

A. Rousseau, P. Deléglise, and Y. Esteve, Enhancing the TED-LIUM corpus with selected data for language modeling and more ted talks, LREC, pp.3935-3939, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01433246

C. Grouin, S. Rosset, P. Zweigenbaum, K. Fort, O. Galibert et al., Proposal for an extension of traditional named entities: From guidelines to evaluation, an overview, Proceedings of the 5th Linguistic Annotation Workshop, pp.92-100, 2011.

V. Vukotic, C. Raymond, and G. Gravier, Is it time to switch to word embedding and recurrent neural networks for spoken language understanding?, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01196915

C. Laurent, G. Pereyra, P. Brakel, Y. Zhang, and Y. Bengio, Batch normalized recurrent neural networks, ICASSP, 2016.

D. Povey and A. , The Kaldi speech recognition toolkit, ASRU, 2011.