D. Bahdanau, K. Cho, and Y. Bengio, Neural machine translation by jointly learning to align and translate, 1409.

I. Sutskever, O. Vinyals, and Q. V. Le, Sequence to sequence learning with neural networks, 1409.

P. Koehn, H. Hoang, A. Birch, C. Callison-burch, M. Federico et al., Moses, Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, ACL '07, pp.177-180, 2007.
DOI : 10.3115/1557769.1557821

K. Cho, B. Van-merrienboer, C. ¸. Gülçehre, F. Bougares, H. Schwenk et al., Learning Phrase Representations using RNN Encoder???Decoder for Statistical Machine Translation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1078.
DOI : 10.3115/v1/D14-1179
URL : https://hal.archives-ouvertes.fr/hal-01433235

S. Jean, K. Cho, R. Memisevic, and Y. Bengio, On Using Very Large Target Vocabulary for Neural Machine Translation, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 1412.
DOI : 10.3115/v1/P15-1001
URL : http://arxiv.org/pdf/1412.2007

H. Le, I. Oparin, A. Messaoudi, A. Allauzen, J. Gauvain et al., Large vocabulary SOUL neural network language models, INTERSPEECH, 2011. [Online]. Available: sources/Le11large.pdf

R. Sennrich, B. Haddow, and A. Birch, Neural Machine Translation of Rare Words with Subword Units, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1508.
DOI : 10.18653/v1/P16-1162

J. Chung, K. Cho, and Y. Bengio, A Character-level Decoder without Explicit Segmentation for Neural Machine Translation, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1603.
DOI : 10.18653/v1/P16-1160

W. Ling, I. Trancoso, C. Dyer, and A. W. Black, Character-based neural machine translation, 1511.

M. R. Costa-jussà and J. A. Fonollosa, Characterbased neural machine translation, 1603.

J. A. Bilmes and K. Kirchhoff, Factored language models and generalized parallel backoff, Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology companion volume of the Proceedings of HLT-NAACL 2003--short papers, NAACL '03, pp.4-6, 2003.
DOI : 10.3115/1073483.1073485
URL : http://dl.acm.org/ft_gateway.cfm?id=1073485&type=pdf

]. J. Niehues, T. Ha, E. Cho, and A. Waibel, Using Factored Word Representation in Neural Network Language Models, Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers, pp.74-82, 2016.
DOI : 10.18653/v1/W16-2208

Y. Wu, H. Yamamoto, X. Lu, S. Matsuda, C. Hori et al., Factored recurrent neural network language model in ted lecture transcription, IWSLT, 2012.

R. Sennrich and B. Haddow, Linguistic Input Features Improve Neural Machine Translation, Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers, 1606.
DOI : 10.18653/v1/W16-2209
URL : http://arxiv.org/pdf/1606.02892

O. Firat, K. Cho, and Y. Bengio, Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism, Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
DOI : 10.18653/v1/N16-1101
URL : http://arxiv.org/pdf/1601.01073

A. Nasr, F. Béchet, J. Rey, B. Favre, and J. L. Roux, Macaon, an nlp tool suite for processing word lattices, Proceedings of the ACL-HLT 2011 System Demonstrations, pp.86-91, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00702442

A. Rousseau, Abstract, The Prague Bulletin of Mathematical Linguistics, vol.100, pp.73-82, 2013.
DOI : 10.2478/pralin-2013-0013
URL : https://hal.archives-ouvertes.fr/hal-01353496

M. D. Zeiler, ADADELTA: an adaptive learning rate method, 1212.

R. Pascanu, T. Mikolov, and Y. Bengio, Understanding the exploding gradient problem, 1211.

X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS'10). Society for Artificial Intelligence and Statistics, 2010.

K. Papineni, S. Roukos, T. Ward, and W. Zhu, BLEU, Proceedings of the 40th Annual Meeting on Association for Computational Linguistics , ACL '02, pp.311-318, 2002.
DOI : 10.3115/1073083.1073135

T. Mikolov, K. Chen, G. Corrado, and J. Dean, Efficient Estimation of Word Representations in Vector Space, Proceedings of Workshop at ICLR, 2013.