. Abeillé-a and . Clément-l.-&-toussenel-f, Building a Treebank for French, Treebanks, pp.165-187, 2003.

. Akbik-a, . &. Blythe-d, and . Vollgraf-r, Contextual string embeddings for sequence labeling, Proceedings of the 27th International Conference on Computational Linguistics, pp.1638-1649, 2018.

. Candito-m and . Seddah-d, Le corpus sequoia : annotation syntaxique et exploitation pour l'adaptation d'analyseur par pont lexical (the sequoia corpus : Syntactic annotation and use for a parser lexical domain adaptation method), Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, vol.2, pp.321-334, 2012.

C. B. Möller, T. Pietsch, M. &. Soni-t, and . M. Yeung-c, , 2019.

C. E. , Introduction to deep learning, 2019.

. Conneau-a, K. Khandelwal, . Goyal-n, . Chaudhary-v, G. Wenzek et al., , 2019.

. Conneau-a, . Rinott-r, G. Lample, . Williams-a, . R. Bowman-s et al.,

. Stoyanov-v.-;-e, D. Riloff, J. Chiang, . Hockenmaier-&-j, and É. Tsujii, XNLI : evaluating cross-lingual sentence representations, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp.2475-2485, 2018.

. M. Dai-a and . V. Le-q, Semi-supervised sequence learning, Advances in Neural Information Processing Systems 28 : Annual Conference on Neural Information Processing Systems, pp.3079-3087, 2015.

P. Delobelle, . &. Winters-t, and . Berendt-b, RobBERT : a Dutch RoBERTa-based Language Model, 2020.

J. Devlin, . Chang-m, and . Lee-k.-&-toutanova-k, Multilingual bert, 2018.

J. Devlin, . Chang-m, . Lee-k.-&-toutanova-k.-;-j, C. Burstein, . Doran-&-t et al., BERT : pre-training of deep bidirectional transformers for language understanding, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics : Human Language Technologies, NAACL-HLT 2019, vol.1, pp.4171-4186, 2019.

. Dozat-t and . D. Manning-c, Deep biaffine attention for neural dependency parsing, 5th International Conference on Learning Representations, 2017.

. Grave-e, P. Bojanowski, . Gupta-p, . Joulin-a.-&-mikolov-t.-;-n, K. Calzolari et al., Learning word vectors for 157 languages, Proceedings of the Eleventh International Conference on, p.62, 2018.

, Language Resources and Evaluation, European Language Resources Association (ELRA), 2018.

. Grave-e, T. Mikolov, . Joulin-a.-&-bojanowski-p.-;-m, P. Lapata, . Blunsom-&-a et al., Bag of tricks for efficient text classification, Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, vol.2, pp.427-431, 2017.

H. J. Ruder-s, Universal language model fine-tuning for text classification, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, vol.1, pp.328-339, 2018.

G. Jawahar, B. Sagot, D. Seddah, S. Unicomb, G. Iñiguez et al., What does bert learn about the structure of language ?, 57th Annual Meeting of the Association for Computational Linguistics (ACL), 2019.
URL : https://hal.archives-ouvertes.fr/hal-02131630

. Joulin-a, E. Grave, P. Bojanowski, M. Douze, and . Jégou-h.-&-mikolov-t, Fasttext.zip : Compressing text classification models, 2016.

. P. Kingma-d and . Ba-j, Adam : A method for stochastic optimization, 2014.

L. Z. Chen-m, . Goodman-s, K. Gimpel, and . Sharma-p.-&-soricut-r, ALBERT : A lite BERT for self-supervised learning of language representations, 2019.

L. H. Vial, L. Frej, J. Segonne, V. Coavoux, M. Lecouteux et al., Flaubert : Unsupervised language model pre-training for french, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02784776

Y. Liu, M. Ott, . Goyal-n, J. Du, M. Joshi et al., Roberta : A robustly optimized BERT pretraining approach, pp.1907-11692, 2019.

M. L. Muller-b, . J. Ortiz-suárez-p, Y. Dupont, L. Romary, . Villemonte-de-la et al., CamemBERT : a Tasty French Language Model, 2019.

. Mikolov-t, E. Grave, P. Bojanowski, C. &. Puhrsch, and . Joulin-a, Advances in pretraining distributed word representations, Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018.

. Mikolov-t, . Sutskever-i, . Chen-k, . S. Corrado-g, and . Dean-j, Distributed representations of words and phrases and their compositionality, Advances in Neural Information Processing Systems 26 : 27th Annual Conference on Neural Information Processing Systems, pp.3111-3119, 2013.

J. Nivre, M. Abrams, ?. Agi?, L. Ahrenberg, L. Antonsen et al., , p.63

E. De-marneffe-m.-c, . De-paiva-v, . Diaz-de-ilarraza-a, C. Dickerson, P. Dirix et al., Faculty of Mathematics and Physics, 2018.

P. J. Ortiz-suárez, . Sagot-b.-&-romary-l.-;-p, A. Ba?ski, H. Barbaresi, E. Biber et al., Asynchronous Pipeline for Processing Huge Corpora on Medium to Low Resource Infrastructures, 7th Workshop on the Challenges in the Management of Large Corpora (CMLC-7), p.2148693, 2019.

J. Pennington, . &. Socher-r, and . D. Manning-c, Glove : Global vectors for word representation, Éds., Proceedings of the, 2014.

, Conference on Empirical Methods in Natural Language Processing, pp.1532-1543, 2014.

. E. Peters-m, M. Neumann, M. Iyyer, . Gardner-m, C. Clark et al., Deep contextualized word representations, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics : Human Language Technologies, vol.1, pp.2227-2237, 2018.

. Pires-t and . Schlinger-e.-&-garrette-d, How multilingual is multilingual bert ?, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019.

. Radford-a, J. Wu, . Child-r, . Luan-d, and . Amodei-d.-&-sutskever-i, Language models are unsupervised multitask learners, 2019.

C. Raffel, N. Shazeer, . Roberts-a, . Lee-k, S. Narang et al., Exploring the limits of transfer learning with a unified text-to-text transformer, 2019.

R. M. Sagot-b and . Stern-r, Annotation référentielle du corpus arboré de Paris 7 en entités nommées (referential named entity annotation of the paris 7 french treebank), 2012.

I. G. Antoniadis, H. Blanchon-&-g, and É. Sérasset, Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, vol.2, pp.535-542, 2012.

M. Straka, J. &. Straková, and . Hajic-j, Evaluating contextualized embeddings on 54 languages in POS tagging, lemmatization and dependency parsing, 2019.

J. Straková, . Straka-m.-&-hajic-j.-;-a, D. R. Korhonen, . Traum-&-l, and É. Màrquez, Neural architectures for nested NER through linearization, Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, vol.1, pp.5326-5331, 2019.

. Tenney-i and . Das-d.-&-pavlick-e, BERT rediscovers the classical NLP pipeline, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp.4593-4601, 2019.

. Virtanen-a, J. Kanerva, . Ilo-r, J. Luoma, J. Luotolahti et al., Multilingual is not enough : Bert for finnish, 2019.

G. Wenzek, . Lachaux-m.-a, . Conneau-a, . Chaudhary-v, F. Guzmán et al., CCNet : Extracting High Quality Monolingual Datasets from Web Crawl Data, 2019.

W. A. Nangia-n and . R. Bowman-s, A broad-coverage challenge corpus for sentence understanding through inference, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics : Human Language Technologies, NAACL-HLT 2018, vol.1, pp.1112-1122, 2018.

. Wolf-t, L. Debut, . Sanh-v, J. Chaumond, C. Delangue et al., Huggingface's transformers : State-of-the-art natural language processing, 2019.