Y. Bengio, J. Louradour, R. Collobert, and J. Weston, Curriculum learning, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, pp.41-48, 2009.

C. Cardellino, M. Teruel, L. A. Alemany, and S. Villata, Learning Slowly To Learn Better: Curriculum Learning for Legal Ontology Population, Thirtieth International Florida Artificial Intelligence Research Society Conference, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01572442

C. Cardellino, M. Teruel, L. A. Alemany, and S. Villata, A low-cost, high-coverage legal named entity recognizer, classifier and linker, Proceedings of the 16th Edition of the International Conference on Articial Intelligence and Law, ICAIL '17, pp.9-18, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01541446

J. Chiu, N. , and E. , Named entity recognition with bidirectional lstm-cnns, Transactions of the Association for Computational Linguistics, vol.4, pp.357-370, 2016.

J. R. Finkel, T. Grenager, and C. D. Manning, Incorporating non-local information into information extraction systems by gibbs sampling, Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics (ACL 2005, pp.363-370, 2005.

G. E. Hinton and R. R. Salakhutdinov, Reducing the dimensionality of data with neural networks, Science, vol.313, issue.5786, pp.504-507, 2006.

S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, Proceedings of the 32Nd International Conference on International Conference on Machine Learning, vol.37, pp.448-456, 2015.

Y. Kim, Convolutional neural networks for sentence classification, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.1746-1751, 2014.

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, 2014.

D. Klein and C. D. Manning, Accurate unlexicalized parsing, Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol.1, pp.423-430, 2003.

C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval, 2008.

T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, Distributed representations of words and phrases and their compositionality, Proceedings of the 26th International Conference on Neural Information Processing Systems, vol.2, pp.3111-3119, 2013.

A. Rasmus, H. Valpola, M. Honkala, M. Berglund, and T. Raiko, Semi-supervised learning with ladder networks, Proceedings of the 28th International Conference on Neural Information Processing Systems, vol.2, pp.3546-3554, 2015.

F. M. Suchanek, G. Kasneci, and G. Weikum, Yago: A core of semantic knowledge, Proceedings of the 16th International Conference on World Wide Web, WWW '07, pp.697-706, 2007.
URL : https://hal.archives-ouvertes.fr/hal-01472497

S. C. Suddarth and Y. L. Kergosien, Rule-injection hints as a means of improving network performance and learning time, Neural Networks, pp.120-129, 1990.

H. Valpola, Chapter 8 -from neural pca to deep unsupervised learning, Advances in Independent Component Analysis and Learning Machines, pp.143-171, 2015.