M. I. Jordan, Serial order: A parallel, distributed processing approach Advances in Connectionist Theory: Speech, 1989.
DOI : 10.1016/s0166-4115(97)80111-2

J. L. Elman, Finding Structure in Time, Cognitive Science, vol.49, issue.2, pp.179-211, 1990.
DOI : 10.1007/BF00308682

S. Hochreiter and J. Schmidhuber, Long Short-Term Memory, Neural Computation, vol.4, issue.8, pp.1735-1780, 1997.
DOI : 10.1016/0893-6080(88)90007-X

T. Mikolov, M. Karafiát, L. Burget, J. Cernocký, and S. Khudanpur, Recurrent neural network based language model, 11th Annual Conference of the International Speech Communication Association, pp.1045-1048, 2010.

T. Mikolov, S. Kombrink, L. Burget, J. Cernock?, and S. Khudanpur, Extensions of recurrent neural network language model, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.5528-5531, 2011.
DOI : 10.1109/ICASSP.2011.5947611

R. Collobert and J. Weston, A unified architecture for natural language processing, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.160-167, 2008.
DOI : 10.1145/1390156.1390177

R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu et al., Natural language processing (almost) from scratch, J. Mach. Learn. Res, pp.12-2493, 2011.

G. Mesnil, X. He, L. Deng, and Y. Bengio, Investigation of recurrent-neuralnetwork architectures and learning methods for spoken language understanding, 2013.

V. Vukotic, C. Raymond, and G. Gravier, Is it time to switch to word embedding and recurrent neural networks for spoken language understanding? In: InterSpeech, 2015.

Y. Bengio, P. Simard, and P. Frasconi, Learning long-term dependencies with gradient descent is difficult, IEEE Transactions on Neural Networks, vol.5, issue.2, pp.157-166, 1994.
DOI : 10.1109/72.279181
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.7128

K. Cho, B. Van-merrienboer, Ç. Gülçehre, F. Bougares, H. Schwenk et al., Learning Phrase Representations using RNN Encoder???Decoder for Statistical Machine Translation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), p.1078, 2014.
DOI : 10.3115/v1/D14-1179
URL : https://hal.archives-ouvertes.fr/hal-01433235

Z. Huang, W. Xu, and K. Yu, Bidirectional lstm-crf models for sequence tagging, 2015.

G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, and C. Dyer, Neural architectures for named entity recognition. arXiv preprint, 2016.
DOI : 10.18653/v1/n16-1030
URL : http://arxiv.org/abs/1603.01360

X. Ma and E. Hovy, End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
DOI : 10.18653/v1/P16-1101
URL : http://arxiv.org/abs/1603.01354

J. Pennington, R. Socher, and C. D. Manning, Glove: Global Vectors for Word Representation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.1532-1543, 2014.
DOI : 10.3115/v1/D14-1162
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.645.8863

T. Lavergne, O. Cappé, and F. Yvon, Practical very large scale CRFs, Proceedings the 48th Annual Meeting of the Association for Computational Linguistics (ACL), pp.504-513, 2010.

M. Dinarelli and S. Rosset, Models cascade for tree-structured named entity detection, Proceedings of International Joint Conference of Natural Language Processing (IJCNLP), 2011.

M. Dinarelli and I. Tellier, Improving recurrent neural networks for sequence labelling, p.2555, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01489976

J. Lafferty, A. Mccallum, and F. Pereira, Conditional random fields: Probabilistic models for segmenting and labeling sequence data, Proceedings of the Eighteenth International Conference on Machine Learning (ICML), pp.282-289, 2001.

M. Dinarelli and I. Tellier, New recurrent neural network variants for sequence labeling, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01489955

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: A simple way to prevent neural networks from overfitting, Journal of Machine Learning Research, vol.15, pp.1929-1958, 2014.

K. Toutanova, D. Klein, C. D. Manning, and Y. Singer, Feature-rich part-of-speech tagging with a cyclic dependency network, Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology , NAACL '03, pp.173-180, 2003.
DOI : 10.3115/1073445.1073478

L. Shen, G. Satta, and A. Joshi, Guided learning for bidirectional sequence classification, Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp.760-767, 2007.

D. Mori, R. Bechet, F. Hakkani-tur, D. Mctear, M. Riccardi et al., Spoken language understanding, IEEE Signal Processing Magazine, vol.25, issue.3, pp.50-58, 2008.
DOI : 10.1109/MSP.2008.918413
URL : https://hal.archives-ouvertes.fr/hal-01314884

D. A. Dahl, M. Bates, M. Brown, W. Fisher, K. Hunicke-smith et al., Expanding the scope of the ATIS task, Proceedings of the workshop on Human Language Technology , HLT '94, pp.43-48, 1994.
DOI : 10.3115/1075812.1075823

H. Bonneau-maynard, C. Ayache, F. Bechet, A. Denis, A. Kuhn et al., Results of the french evalda-media evaluation campaign for literal understanding, pp.2054-2059, 2006.
URL : https://hal.archives-ouvertes.fr/hal-01160167

T. Mikolov, K. Chen, G. Corrado, and J. Dean, Efficient estimation of word representations in vector space, p.3781, 2013.

D. Chen and C. Manning, A Fast and Accurate Dependency Parser using Neural Networks, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.740-750, 2014.
DOI : 10.3115/v1/D14-1082
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.654.8984

T. Mikolov, W. Yih, and G. Zweig, Linguistic regularities in continuous space word representations In: Human Language Technologies: Conference of the North American Chapter, the Association of Computational Linguistics, pp.746-751, 2013.

Y. Bengio, Practical Recommendations for Gradient-Based Training of Deep Architectures, p.5533, 2012.
DOI : 10.1162/089976602317318938

P. Werbos, Backpropagation through time: what it does and how to do it, Proceedings of IEEE, pp.1550-1560, 1990.
DOI : 10.1109/5.58337

J. P. Chiu and E. Nichols, Named entity recognition with bidirectional lstm-cnns, p.8308, 2015.

M. Schuster and K. Paliwal, Bidirectional recurrent neural networks, IEEE Transactions on Signal Processing, vol.45, issue.11, pp.2673-2681, 1997.
DOI : 10.1109/78.650093
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.331.9441

C. Raymond and G. Riccardi, Generative and discriminative algorithms for spoken language understanding, Proceedings of the International Conference of the Speech Communication Assosiation (Interspeech), pp.1605-1608, 2007.

G. Mesnil, Y. Dauphin, K. Yao, Y. Bengio, L. Deng et al., Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding, Speech, and Language Processing, 2015.
DOI : 10.1109/TASLP.2014.2383614

L. Ramshaw and M. Marcus, Text Chunking Using Transformation-Based Learning, Proceedings of the 3rd Workshop on Very Large Corpora, pp.84-94, 1995.
DOI : 10.1007/978-94-017-2390-9_10
URL : http://arxiv.org/abs/cmp-lg/9505040

Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin, Neural Probabilistic Language Models, JOURNAL OF MACHINE LEARNING RESEARCH, vol.3, pp.1137-1155, 2003.
DOI : 10.1007/3-540-33486-6_6
URL : https://hal.archives-ouvertes.fr/hal-01434258

K. He, X. Zhang, S. Ren, and J. Sun, Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, 2015 IEEE International Conference on Computer Vision (ICCV), pp.1026-1034, 2015.
DOI : 10.1109/ICCV.2015.123
URL : http://arxiv.org/pdf/1502.01852

M. Dinarelli and I. Tellier, Etude des reseaux de neurones recurrents pour etiquetage de sequences, Actes de la 23eme conf ? ©rence sur le Traitement Automatique des Langues Naturelles, 2016.

M. P. Marcus, B. Santorini, and M. A. Marcinkiewicz, Building a large annotated corpus of english: The penn treebank, COMPUTATIONAL LINGUISTICS, vol.19, pp.313-330, 1993.

V. Vukotic, C. Raymond, and G. Gravier, A Step Beyond Local Observations with a Dialog Aware Bidirectional GRU Network for Spoken Language Understanding, Interspeech 2016, 2016.
DOI : 10.21437/Interspeech.2016-1301
URL : https://hal.archives-ouvertes.fr/hal-01351733

M. Dinarelli, A. Moschitti, and G. Riccardi, Discriminative Reranking for Spoken Language Understanding, IEEE Transactions on Audio, Speech, and Language Processing, vol.20, issue.2, pp.526-539, 2011.
DOI : 10.1109/TASL.2011.2162322
URL : https://hal.archives-ouvertes.fr/hal-01478984

M. Dinarelli and S. Rosset, Hypotheses selection criteria in a reranking framework for spoken language understanding, Conference of Empirical Methods for Natural Language Processing, pp.1104-1115, 2011.

S. Hahn, M. Dinarelli, C. Raymond, F. Lefèvre, P. Lehen et al., Comparing Stochastic Approaches to Spoken Language Understanding in Multiple Languages, IEEE Transactions on Audio, Speech, and Language Processing, vol.19, issue.6, p.99, 2010.
DOI : 10.1109/TASL.2010.2093520
URL : https://hal.archives-ouvertes.fr/hal-00746965

R. Herbrich, T. Graepel, and K. Obermayer, In: Large Margin Rank Boundaries for Ordinal Regression, 2000.

S. Hahn, P. Lehnen, G. Heigold, and H. Ney, Optimizing crfs for slu tasks in various languages using modified training criteria, Proceedings of the International Conference of the Speech Communication Assosiation (Interspeech), 2009.

J. G. Fiscus, A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER), 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings, pp.347-352, 1997.
DOI : 10.1109/ASRU.1997.659110
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.23.5624