D. Bahdanau, K. Cho, and Y. Bengio, Neural machine translation by jointly learning to align and translate, 2014.

Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin, A neural probabilistic language model, JOURNAL OF MACHINE LEARNING RESEARCH, vol.3, pp.1137-1155, 2003.

H. Bonneau-maynard, C. Ayache, F. Bechet, A. Denis, A. Kuhn et al., Results of the french evalda-media evaluation campaign for literal understanding, LREC, pp.2054-2059, 2006.
URL : https://hal.archives-ouvertes.fr/hal-01160167

S. Brants, S. Dipper, P. Eisenberg, S. Hansen-schirra, E. Konig et al., TIGER : Linguistic interpretation of a german corpus, Research on Language and Computation, vol.2, issue.4, pp.597-620, 2004.

M. X. Chen, O. Firat, A. Bapna, M. Johnson, W. Macherey et al., The Best of Both Worlds : Combining Recent Advances in Neural Machine Translation, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, vol.1, pp.76-86, 2018.

K. Cho, B. Van-merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares et al., Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.1724-1734, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01433235

M. Collins, Three generative, lexicalised models for statistical parsing, Proceedings of ACL, pp.16-23, 1997.

R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu et al., Natural language processing (almost) from scratch, J. Mach. Learn. Res, p.12, 2011.

Z. Dai, Z. Yang, Y. Yang, J. Carbonell, Q. V. Le et al., , 2019.

. Transformer-xl, Attentive Language Models Beyond a Fixed-Length Context

D. Mori, R. Bechet, F. Hakkani-tur, D. Mctear, M. Riccardi et al., Spoken language understanding : A survey, IEEE Signal Processing Magazine, vol.25, pp.50-58, 2008.
URL : https://hal.archives-ouvertes.fr/hal-01314884

M. Dehghani, S. Gouws, O. Vinyals, J. Uszkoreit, and L. Kaiser, Universal transformers. CoRR, 2018.

J. Devlin, M. Chang, K. Lee, and K. Toutanova, BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding, 2018.

M. Dinarelli, Spoken Language Understanding : from Spoken Utterances to Semantic Structures, Dipartimento di Ingegneria e Scienza dell' Informazione, via Sommarive, vol.14, 2010.

M. Dinarelli and L. Grobol, Seq2biseq : Bidirectional output-wise recurrent neural networks for sequence modelling, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02085093

M. Dinarelli, A. Moschitti, and G. Riccardi, Concept segmentation and labeling for conversational speech, Proceedings of the International Conference of the Speech Communication Assosiation (Interspeech), 2009.

M. Dinarelli, A. Moschitti, and G. Riccardi, Re-ranking models based on small training data for spoken language understanding, Conference of Empirical Methods for Natural Language Processing, pp.11-18, 2009.

M. Dinarelli, A. Moschitti, and G. Riccardi, Discriminative reranking for spoken language understanding, IEEE TASLP, vol.20, pp.526-539, 2011.
URL : https://hal.archives-ouvertes.fr/hal-01478984

M. Dinarelli and S. Rosset, Hypotheses selection criteria in a reranking framework for spoken language understanding, Conference of Empirical Methods for Natural Language Processing, pp.1104-1115, 2011.

M. Dinarelli and S. Rosset, Tree representations in probabilistic models for extended named entity detection, European Chapter of the Association for Computational Linguistics (EACL), pp.174-184, 2012.

M. Dinarelli and S. Rosset, Tree-structured named entity recognition on ocr data : Analysis, processing and results, Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12), 2012.
URL : https://hal.archives-ouvertes.fr/hal-01490004

M. Dinarelli and I. Tellier, Improving recurrent neural networks for sequence labelling, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01489976

M. Dinarelli and I. Tellier, New recurrent neural network variants for sequence labeling, Proceedings of the 17th International Conference on Intelligent Text Processing and Computational Linguistics, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01489955

M. Dinarelli, V. Vukotic, and C. Raymond, Label-dependency coding in Simple Recurrent Networks for Spoken Language Understanding, Interspeech, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01553830

Y. Dupont, M. Dinarelli, and I. Tellier, Label-dependencies aware recurrent neural networks, Proceedings of CICling, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01579071

C. Grouin, M. Dinarelli, S. Rosset, G. Wisniewski, and P. Zweigenbaum, , 2011.

, Coreference resolution in clinical reports. the limsi participation in the i2b2/va 2011 challenge, Proceedings of i2b2/VA 2011 Coreference Resolution Workshop

Q. Guo, X. Qiu, P. Liu, Y. Shao, X. Xue et al., , 2019.

. Corr,

S. Hahn, M. Dinarelli, C. Raymond, F. Lefèvre, P. Lehen et al., Comparing stochastic approaches to spoken language understanding in multiple languages, IEEE TASLP, p.99, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00746965

G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, and C. Dyer, , 2016.

, Neural Architectures for Named Entity Recognition, Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics : Human Language Technologies, pp.260-270

T. Lavergne and F. Yvon, Learning the structure of variable-order crfs : a finitestate perspective, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp.433-439, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01710793

O. Levy, K. Lee, N. Fitzgerald, and L. Zettlemoyer, Long short-term memory as a dynamically computed element-wise weighted sum, Proceedings of ACL, pp.732-739, 2018.

X. Ma and E. Hovy, End-to-end sequence labeling via bi-directional lstm-cnns-crf, Proceedings of ACL, 2016.

M. P. Marcus, B. Santorini, and M. A. Marcinkiewicz, Building a large annotated corpus of english : The penn treebank, COMPUTATIONAL LINGUISTICS, vol.19, issue.2, 1993.

V. Ng and C. Cardie, Improving Machine Learning Approcahes to Corefrence Resolution, Proceedings of ACL'02, pp.104-111, 2002.

J. Pennington, R. Socher, and C. D. Manning, Glove : Global vectors for word representation, Empirical Methods in Natural Language Processing, pp.1532-1543, 2014.

M. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark et al., Deep Contextualized Word Representations, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics : Human Language Technologies, vol.1, pp.2227-2237, 2018.

S. Quarteroni, G. Riccardi, and M. Dinarelli, What's in an ontology for spoken language understanding, Proceedings of the International Conference of the Speech Communication Assosiation (Interspeech), 2009.

L. Ramshaw and M. Marcus, Text chunking using transformation-based learning, Proceedings of the 3rd Workshop on Very Large Corpora, pp.84-94, 1995.

A. M. Rush, R. Reichart, M. Collins, and A. Globerson, Improved parsing and pos tagging using inter-sentence consistency constraints, Proceedings of EMNLP-CoNLL, 2012.

W. M. Soon, H. T. Ng, and D. C. Lim, A Machine Learning Approach to Coreference Resolution of Noun Phrases, Computational Linguistics, vol.27, issue.4, pp.521-544, 2001.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout : A simple way to prevent neural networks from overfitting, Journal of Machine Learning Research, vol.15, pp.1929-1958, 2014.

I. Sutskever, O. Vinyals, and Q. V. Le, Sequence to sequence learning with neural networks, Proceedings of NIPS, 2014.

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones et al., Attention is All you Need, Advances in Neural Information Processing Systems, vol.30, pp.5998-6008, 2017.

O. Vinyals, L. Kaiser, T. Koo, S. Petrov, I. Sutskever et al., , 2015.

, Grammar As a Foreign Language, Proceedings of the 28th International Conference on Neural Information Processing, vol.2, pp.2773-2781

Y. Xia, F. Tian, L. Wu, J. Lin, T. Qin et al., Deliberation networks : Sequence generation beyond one-pass decoding, Advances in Neural Information Processing Systems, vol.30, pp.1784-1794, 2017.

X. Zhang, J. Su, Y. Qin, Y. Liu, R. Ji et al., Asynchronous bidirectional decoding for neural machine translation, 2018.

Y. Zhang, H. Chen, Y. Zhao, Q. Liu, and D. Yin, Learning tag dependencies for sequence tagging, International Joint Conference on Artificial Intelligence (IJCAI), 2018.