F. Kubala, R. Schwartz, R. Stone, and R. Weischedel, Named entity extraction from speech, Broadcast News Transcription and Understanding Workshop, pp.287-292, 1998.

A. L. Gorin, G. Riccardi, and J. H. Wright, How may i help you?, Speech communication, vol.23, issue.1-2, pp.113-127, 1997.

S. Yaman, L. Deng, D. Yu, Y. Wang, and A. Acero, An integrative and discriminative technique for spoken utterance classification, IEEE Transactions on Audio, Speech, and Language Processing, vol.16, issue.6, pp.1207-1214, 2008.

G. Tur, L. Deng, D. Hakkani-tür, and X. He, Towards deeper understanding: Deep convex networks for semantic utterance classification, ICASSP, pp.5045-5048, 2012.

M. Morchid, G. Linares, M. El-beze, and R. Mori, Theme identification in telephone service conversations using quaternions of speech features, Interspeech, pp.1394-1398, 2013.
URL : https://hal.archives-ouvertes.fr/hal-01339930

G. Tur and R. Mori, Spoken language understanding: Systems for extracting semantic information from speech, 2011.

Y. Chen, W. Y. Wang, and A. I. Rudnicky, Unsupervised induction and filling of semantic slots for spoken dialogue systems using frame-semantic parsing, IEEE Workshop on Automatic Speech Recognition and Understanding, pp.120-125, 2013.

E. Simonnet, S. Ghannay, N. Camelin, Y. Estève, and R. Mori, ASR error management for improving spoken language understanding, Interspeech, pp.3329-3333, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01526298

S. Ghannay, A. Caubrière, Y. Estève, N. Camelin, E. Simonnet et al., End-to-end named entity and semantic concept extraction from speech, IEEE Spoken Language Technology Workshop, pp.692-699, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01987740

Y. Bengio, J. Louradour, R. Collobert, and J. Weston, Curriculum learning, International Conference on Machine Learning, pp.41-48, 2009.

Y. Bengio, Deep learning of representations for unsupervised and transfer learning, International Conference on Unsupervised and Transfer Learning workshop, pp.17-37, 2011.

H. Bonneau-maynard, S. Rosset, C. Ayache, A. Kuhn, and D. Mostefa, Semantic annotation of the French MEDIA dialog corpus, Eurospeech, pp.3456-3459, 2005.

F. Lefèvre, D. Mostefa, L. Besacier, Y. Estève, M. Quignard et al., Leveraging study of robustness and portability of spoken language understanding systems across languages and domains: the PORT-MEDIA corpora, Language Resources and Evaluation Conference, pp.1436-1442, 2012.

A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos et al., Deep speech: Scaling up end-to-end speech recognition, 2014.

A. Bérard, O. Pietquin, L. Besacier, and C. Servan, Listen and translate: A proof of concept for end-to-end speech-to-text translation, NIPS Workshop on end-to-end learning for speech and audio processing, 2016.

R. J. Weiss, J. Chorowski, N. Jaitly, Y. Wu, and Z. Chen, Sequence-to-sequence models can directly translate foreign speech, Interspeech, pp.2625-2629, 2017.

A. Bérard, L. Besacier, A. C. Kocabiyikoglu, and O. Pietquin, End-to-end automatic speech translation of audiobooks, ICASSP, pp.6224-6228, 2018.

N. Jan, R. Cattoni, S. Sebastian, M. Cettolo, M. Turchi et al., The iwslt 2018 evaluation campaign, International Workshop on Spoken Language Translation, pp.2-6, 2018.

D. Serdyuk, Y. Wang, C. Fuegen, A. Kumar, B. Liu et al., Towards end-to-end spoken language understanding, ICASSP, pp.5754-5758, 2018.

J. L. Elman, Learning and development in neural networks: The importance of starting small, Cognition, vol.48, issue.1, pp.71-99, 1993.

E. A. Platanios, O. Stretcu, G. Neubig, B. Poczos, and T. M. Mitchell, Competence-based curriculum learning for neural machine translation, North American Chapter of the Association for Computational Linguistics, pp.1162-1172, 2019.

B. Jabaian, F. Lefèvre, and L. Besacier, Portability of semantic annotations for fast development of dialogue corpora, Interspeech, pp.214-217, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00954198

D. Amodei, S. Ananthanarayanan, R. Anubhai, J. Bai, E. Battenberg et al., Deep speech 2: End-to-end speech recognition in english and mandarin, International conference on machine learning, pp.173-182, 2016.

A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks, International Conference on Machine Learning, pp.369-376, 2006.

S. Hahn, M. Dinarelli, C. Raymond, F. Lefevre, P. Lehnen et al., Comparing stochastic approaches to spoken language understanding in multiple languages, IEEE Transactions on Audio, Speech, and Language Processing, vol.19, issue.6, pp.1569-1583, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00746965

K. A. Krueger and P. Dayan, Flexible shaping: How learning in small steps helps, Cognition, vol.110, issue.3, pp.380-394, 2009.

Y. Estève, T. Bazillon, J. Antoine, F. Béchet, and J. Farinas, The EPAC corpus: Manual and automatic annotations of conversational speech in French broadcast news, Language Resources and Evaluation Conference, pp.1686-1689, 2010.

S. Galliano, G. Gravier, and L. Chaubard, The ESTER 2 evaluation campaign for the rich transcription of French radio broadcasts, Interspeech, pp.2543-2546, 2009.

G. Gravier, G. Adda, N. Paulson, M. Carré, A. Giraudel et al., The ETAPE corpus for the evaluation of speechbased TV content processing in the French language, Language Resources and Evaluation Conference, pp.114-118, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00712591

C. Grouin, S. Rosset, P. Zweigenbaum, K. Fort, O. Galibert et al., Proposal for an extension of traditional named entities: From guidelines to evaluation, an overview, Linguistic Annotation Workshop, pp.92-100, 2011.

A. Giraudel, M. Carré, V. Mapelli, J. Kahn, O. Galibert et al., The REPERE corpus: a multimodal corpus for person recognition, Language Resources and Evaluation Conference, pp.1102-1107, 2012.

T. Lavergne, O. Cappé, and F. Yvon, Practical very large scale crfs, Association for Computational Linguistics, pp.504-513, 2010.

A. Nasr, F. Béchet, and J. Rey, Macaon: Une chaîne linguistique pour le traitement de graphes de mots, Traitement Automatique des Langues Naturelles, 2010.

J. Devlin, M. Chang, K. Lee, and K. Toutanova, Bert: Pretraining of deep bidirectional transformers for language understanding, American Chapter of the Association for Computational Linguistics, 2018.

N. Tomashenko, A. Caubrière, and Y. Estève, Investigating adaptation and transfer learning for end-to-end spoken language understanding from speech, Interspeech, 2019.