H. Afli, L. Barrault, and H. Schwenk, Parallel Texts Extraction from Multimodal Comparable Corpora, Advances in Natural Language Processing, pp.40-51, 2012.
DOI : 10.1007/978-3-642-33983-7_5

H. Afli, L. Barrault, and H. Schwenk, Multimodal comparable corpora as resources for extracting parallel data: Parallel phrases extraction, Proceedings of IJCNLP, 2013.
URL : https://hal.archives-ouvertes.fr/hal-01433457

A. Axelrod, X. He, and J. Gao, Domain adaptation via pseudo in-domain data selection, Proceedings of EMNLP, 2011.

M. Cettolo, C. Girardi, and M. Federico, WIT 3 : Web inventory of transcribed and translated talks, Proceedings of EAMT, 2012.

J. Chen, J. Devlin, H. Cao, R. Prasad, and P. Natarajan, Automatic tune set generation for machine translation with limited in-domain data, Proceedings of EAMT, 2012.

M. Eck, S. Vogel, and A. Waibel, Language model adaptation for statistical machine translation based on information retrieval, Proceedings of LREC, 2004.

M. Federico, N. Bertoldi, and M. Cettolo, IRSTLM: an open source toolkit for handling large scale language models, Proceedings of INTERSPEECH, 2008.

G. F. Foster, C. Goutte, and R. Kuhn, Discriminative instance weighting for domain adaptation in statistical machine translation, Proceedings of EMNLP, 2010.

J. Gao, J. Goodman, M. Li, L. , and K. , Toward a unified approach to statistical language modeling for Chinese, ACM Transactions on Asian Language Information Processing, vol.1, issue.1, pp.3-33, 2002.
DOI : 10.1145/595576.595578

G. Gascó, M. Rocha, G. Sanchis-trilles, J. Andrés-ferrer, C. et al., Does more data always yield better translations, Proceedings of EACL, 2012.

N. Habash, O. Rambow, R. , and R. , MADA+TOKAN: A toolkit for arabic tokenization, diacritization , morphological disambiguation, pos tagging, stemming and lemmatization, Proceedings of the 2nd International Conference on Arabic Language Resources and Tools (MEDAR), 2009.

A. S. Hildebrand, M. Eck, S. Vogel, and A. Waibel, Adaptation of the translation model for statistical machine translation based on information retrieval, Proceedings of EAMT, 2005.

P. Koehn, H. Hoang, A. Birch, C. Callison-burch, M. Federico et al., Moses, Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, ACL '07, 2007.
DOI : 10.3115/1557769.1557821

W. Lewis and S. Eetemadi, Dramatically reducing training data size through vocabulary saturation, Proceedings of WMT, 2013.

Y. Lu, J. Huang, and Q. Liu, Improving statistical machine translation performance by training data selection and optimization, Proceedings of EMNLP-CoNLL, 2007.

S. Mirkin and N. Cancedda, Assessing quick update methods of statistical translation models, Proceedings of IWSLT, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00953812

R. C. Moore and W. Lewis, Intelligent selection of language model training data, Proceedings of the ACL 2010 Conference Short Papers, 2010.

K. Papineni, S. Roukos, T. Ward, and W. Zhu, BLEU, Proceedings of the 40th Annual Meeting on Association for Computational Linguistics , ACL '02, 2002.
DOI : 10.3115/1073083.1073135

Y. Song, P. Klassen, F. Xia, K. , and C. , Entropy-based training data selection for domain adaptation, Proceedings of COLING (Posters), 2012.

J. Tiedemann, Parallel data, tools and interfaces in opus, Proceedings of LREC, 2012.

K. Yasuda, R. Zhang, H. Yamamoto, and E. Sumita, Method of selecting training data to build a compact and efficient translation model, Proceedings of IJCNLP, 2008.