. Amodei, Deep speech 2: End-to-end speech recognition in English and Mandarin, International conference on machine learning, pp.173-182, 2016.

D. Bahdanau, J. Chorowski, D. Serdyuk, P. Brakel, and Y. Bengio, End-to-end attention-based large vocabulary speech recognition, pp.4945-4949, 2016.

F. Bechet, B. Maza, N. Bigouroux, T. Bazillon, and M. El-beze, DECODA: a call-centre human-human spoken conversation corpus, LREC, pp.1343-1347, 2012.

H. Bonneau-maynard, C. Ayache, and F. Bechet, Results of the French Evalda-Media evaluation campaign for literal understanding, 2006.
URL : https://hal.archives-ouvertes.fr/hal-01160167

Y. P. Chen, R. Price, and S. Bangalore, Spoken language understanding without speech recognition, 2018.

S. Deena, Semi-supervised adaptation of RNNLMs by fine-tuning with domainspecific auxiliary features, pp.2715-2719, 2017.

M. Delcroix, S. Watanabe, A. Ogawa, S. Karita, and T. Nakatani, Auxiliary feature based adaptation of end-to-end asr systems, pp.2444-2448, 2018.

L. Devillers, The french MEDIA/EVALDA project: the evaluation of the understanding capability of spoken language dialogue systems, 2004.

Y. Estève, T. Bazillon, J. Y. Antoine, F. Béchet, and J. Farinas, The EPAC corpus: Manual and automatic annotations of conversational speech in french broadcast news, 2010.

S. Galliano, The ESTER 2 evaluation campaign for the rich transcription of french radio broadcasts, 2009.

J. Gao, M. Galley, and L. Li, Neural approaches to conversational AI. Foundations and Trends in Information Retrieval pp, pp.127-298, 2019.

S. Ghannay, A. Caubrière, and Y. Estève, End-to-end named entity and semantic concept extraction from speech, SLT. pp, pp.692-699, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01987740

A. Giraudel, M. Carré, V. Mapelli, J. Kahn, O. Galibert et al., The REPERE corpus: a multimodal corpus for person recognition, LREC, pp.1102-1107, 2012.

A. Graves, Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks, Proceedings of the 23rd international conference on Machine learning, pp.369-376, 2006.

G. Gravier, G. Adda, and N. Paulson, The ETAPE corpus for the evaluation of speechbased TV content processing in the french language, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00712591

C. Grouin, S. Rosset, P. Zweigenbaum, K. Fort, O. Galibert et al., Proposal for an extension of traditional named entities: From guidelines to evaluation, an overview, Proceedings of the 5th Linguistic Annotation Workshop, pp.92-100, 2011.

P. Haghani, From audio to semantics: Approaches to end-to-end spoken language understanding, 2018.

F. Lefèvre, Robustness and portability of spoken language understanding systems among languages and domains: the PortMedia project, pp.779-786, 2012.

L. Lugosch, M. Ravanelli, P. Ignoto, V. S. Tomar, and Y. Bengio, Speech model pre-training for end-to-end spoken language understanding, 2019.

S. J. Pan and Q. Yang, A survey on transfer learning, pp.1345-1359, 2010.

D. Povey and A. Ghoshal, The Kaldi speech recognition toolkit, 2011.

Y. Qian and R. Ubale, Exploring ASR-free end-to-end modeling to improve spoken language understanding in a cloud-based dialog system, pp.569-576, 2017.

L. A. Ramshaw and M. P. Marcus, Text chunking using transformation-based learning, Natural language processing using very large corpora, pp.157-176, 1999.

A. Rousseau, P. Deléglise, and Y. Esteve, Enhancing the TED-LIUM corpus with selected data for language modeling and more ted talks, LREC, pp.3935-3939, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01433246

G. Saon, H. Soltau, D. Nahamoo, and M. Picheny, Speaker adaptation of neural network acoustic models using i-vectors, pp.55-59, 2013.

D. Serdyuk, Y. Wang, C. Fuegen, A. Kumar, B. Liu et al., Towards end-to-end spoken language understanding, 2018.

E. Simonnet, Simulating asr errors for training SLU systems, LREC 2018, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01715923

I. Sutskever, O. Vinyals, and Q. V. Le, Sequence to sequence learning with neural networks, Advances in neural information processing systems, pp.3104-3112, 2014.

V. Vukotic, C. Raymond, and G. Gravier, Is it time to switch to word embedding and recurrent neural networks for spoken language understanding, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01196915