, we have filtered the list to obtain a gold standard where each phrase pair contains at least one multi-word phrase. Finally there are 95 of such phrase pairs. The Semeval 2013 task5 gold standard was originally a binary reference where 7,814 phrase pairs are tagged with "positive" or "negative" for the similarity

, A.3.3 Phrase alignment

, which consist of 248 word pairs for BC and 139 for WE. The reference list for unified multi-word phrase in WE is built based on the term list provided by the project site. Finally this list contains 73 phrase pairs but each pair has multiple variant translations and in our settings, we consider them to be also the gold translations 7 . The list is built based on 277 one-to-one mapping pairs, but if we use directly this list the results would be biased by those who have multiple translations, therefore we factorized these pairs and finally obtained a list of 73 one-to-many mapping pairs. The reference list for the Italian/English task is the same as in Artetxe, Labaka and Agirre (2016) which contains 1,500 entries. The candidate list is also extracted from PKE following the same pipeline as in the monolingual tasks. Therefore the WE-English and WE-French corpus has always 8,923 and 6,412 phrases as for the phrase synonymy task, Apart from the new reference list that we have built, the reference lists for single-word phrase alignment for BC and WE corpora are the same as used in Hazem and Morin, 2016.

, From this dictionary we select a subset of 3,007 entries from the BC corpora and a subset of 2,745 entries from the WE corpora based on a word frequency threshold of 5. These two subsets are used as the training data in our word embedding mapping experiments. For our Italian/English experiments, we only use the same seed lexicon 9 as used in Artetxe

, The reference list and evaluation software are available here

F. English and C. Spanish, Regarding the embedding models of the domain specialized corpora, we use deeplearning4j 11 to train domain-specific 100-dimensional word embeddings using the Skip-gram model

, A.5.2 Language models

, We have incorporated the implementation of BERT 12 and ELMo 13 because they both have pretrained models on multiple languages. The BERT implementation has a multilingual model which contains 104 languages while the ELMo implementation has 44 separate language models

M. Artetxe, G. Labaka, and E. Agirre, Learning principled bilingual mappings of word embeddings while preserving monolingual invariance, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, vol.16, 2016.

T. X. Austin and U. , , pp.2289-2294

, Generalizing and Improving Bilingual Word Embedding Mappings with a Multi-Step Framework of Linear Transformations, Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI'18), pp.5012-5019, 2018.

, Unsupervised Statistical Machine Translation, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP'18), pp.3632-3642, 2018.

M. Artetxe, G. Labaka, E. Agirre, and K. Cho, Unsupervised Neural Machine Translation, Proceedings of the 6th International Conference on Learning Representations (ICLR'18), 2018.

A. Axelrod, X. He, and J. Gao, Domain Adaptation via Pseudo In-Domain Data Selection, Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP'11), pp.355-362, 2011.

D. Bahdanau, K. Cho, and Y. Bengio, Neural Machine Translation by Jointly Learning to Align and Translate, 2014.

M. Baroni and A. Lenci, Distributional Memory: A General Framework for Corpus-Based Semantics, Computational Linguistics 36, vol.4, pp.673-721, 2010.

P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, Enriching Word Vectors with Subword Information, Transactions of the Association for Computational Linguistics 5, pp.2307-387, 2017.

J. Botha and P. Blunsom, Compositional Morphology for Word Representations and Language Modelling, Proceedings of the 31st International Conference on Machine Learning (ICML'14), vol.32, pp.1899-1907, 2014.

D. Bouamor, N. Semmar, and P. Zweigenbaum, Context Vector Disambiguation for Bilingual Lexicon Extraction from Comparable Corpora, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL'13), 2013.
URL : https://hal.archives-ouvertes.fr/cea-01844697

B. Sofia, , pp.759-764

J. Bromley, I. Guyon, Y. Lecun, E. Säckinger, and R. Shah, Signature Verification using a "Siames, International Journal of Pattern Recognition and Artificial Intelligence, pp.669-688, 1993.

J. A. Bullinaria and J. P. Levy, Extracting semantic representations from word co-occurrence statistics: A computational study, Behavior Research Methods, vol.39, pp.510-526, 2007.

. Camacho-collados, M. T. José, R. Pilehvar, and . Navigli, NASARI: Integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities, Artificial Intelligence, vol.240, pp.4-3702, 2016.

Y. Chen, Y. Liu, Y. Cheng, O. K. Victor, and . Li, A Teacher-Student Framework for Zero-Resource Neural Machine Translation, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL'17), pp.17-1176, 2017.

Y. Chiao and P. Zweigenbaum, Looking for Candidate Translational Equivalents in Specialized, Comparable Corpora, Proceedings of the 19th International Conference on Computational Linguistics (COLING'02), pp.1-5, 2002.

K. Cho, B. Van-merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares et al., Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP'14), pp.1724-1734, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01433235

J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling, 2014.

K. Church, P. Ward, and . Hanks, Word Association Norms, Mutual Information, and Lexicography, Computational Linguistics 16.1, pp.891-2017, 1990.

I. Dagan, F. Pereira, and L. Lee, Similarity-based Estimation of Word Cooccurrence Probabilities, Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics (ACL '94), pp.272-278, 1994.

A. M. Dai, V. Quoc, and . Le, Semi-supervised Sequence Learning, Advances in Neural Information Processing Systems 28 (NIPS'15), pp.3079-3087, 2015.

Z. Dai, Z. Yang, Y. Yang, J. Carbonell, V. Quoc et al., Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context, 2019.

A. Das, H. Yenala, M. Chinnakotla, and M. Shrivastava, Together we stand: Siamese Networks for Similar Question Retrieval, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL'16), pp.378-387, 2016.

. Deerwester, S. T. Scott, G. W. Dumais, T. K. Furnas, R. Landauer et al., Indexing by latent semantic analysis, Journal of the Association for Information Science and Technology, vol.41, pp.391-407, 1990.

M. Del, A. Tättar, and M. Fishel, Phrase-based Unsupervised Machine Translation with Compositional Phrase Embeddings, Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pp.361-367, 2018.

J. Devlin, M. Chang, K. Lee, and K. Toutanova, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, 2018.

T. Dunning, Accurate Methods for the Statistics of Surprise and Coincidence, Computational Linguistics, pp.61-74, 1993.

J. L. Elman, Finding Structure in Time, Cognitive Science, vol.14, issue.2, pp.179-211, 1990.

S. Evert, The Statistics of Word Cooccurrences: Word Pairs and Collocations, 2005.

R. M. Fano, Transmission of Information: A Statistical Theory of Communications, 1961.

M. Faruqui and C. Dyer, Improving Vector Space Word Representations Using Multilingual Correlation, Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL'14), pp.462-471, 2014.

G. Finch, Linguistic Terms and Concepts, pp.978-979, 2000.

O. Firat, B. Sankaran, Y. Al-onaizan, F. T. , Y. Vural et al., Zero-Resource Translation with Multi-Lingual Neural Machine Translation, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP'16), pp.268-277, 2016.

K. Fukushima, Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position, Biological Cybernetics, vol.36, pp.193-202, 1980.

P. Fung, Compiling Bilingual Lexicon Entries From a non-Parallel English-Chinese Corpus, Proceedings of the 3rd Annual Workshop on Very Large Corpora (VLC'95), pp.173-183, 1995.

J. Garten, K. Sagae, V. Ustun, and M. Dehghani, Combining Distributed Vector Representations for Words, Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, pp.95-101, 2015.

F. A. Gers, A. Jürgen, F. A. Schmidhuber, and . Cummins, Learning to Forget: Continual Prediction with LSTM, Neural Computation 12.10, pp.899-7667, 2000.

X. Glorot, A. Bordes, and Y. Bengio, Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS'11), vol.15, pp.315-323, 2011.

J. Goikoetxea, E. Agirre, and A. Soroa, Single or Multiple? Combining Word Representations Independently Learned from Text and WordNet, Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI'16), pp.2608-2614, 2016.

Y. Goldberg and O. Levy, word2vec Explained: deriving Mikolov et al.'s negativesampling word-embedding method, 2014.

C. Goller and A. Küchler, Learning Task-Dependent Distributed Representations by Backpropagation Through Structure, Proceedings of International Conference on Neural Networks (ICNN'96), pp.347-352, 1996.

I. J. Goodfellow, J. Pouget-abadie, M. Mirza, B. Xu, D. Warde-farley et al., Advances in Neural Information Processing Systems 27 (NIPS'14), vol.2, pp.2672-2680, 2014.

S. Gouws, Y. Bengio, and G. Corrado, BilBOWA: Fast Bilingual Distributed Representations Without Word Alignments, Proceedings of the 32nd International Conference on International Conference on Machine Learning (ICML'15), pp.748-756, 2015.

G. Grefenstette, The World Wide Web as a Resource for Example-Based Machine Translation Tasks, Proceedings of the ASLIB Conference on Translating and the Computer, vol.21, 1999.

K. Greff, R. K. Srivastava, J. Koutnik, R. Bas, J. Steunebrink et al., LSTM: A Search Space Odyssey, IEEE Transactions on Neural Networks and Learning Systems 28.10, pp.2162-2388, 2017.

C. Groc and . De, Babouk: Focused Web Crawling for Corpus Compilation and Automatic Terminology Extraction, ACM International Conferences on Web Intelligence and Intelligent Agent Technology, vol.1, pp.497-498, 2011.

Z. Harris, Distributional Structure, vol.10, issue.2-3, pp.146-162, 1954.

A. Hazem and B. Daille, Word Embedding Approach for Synonym Extraction of Multi-Word Terms, Proceedings of the 11th edition of the Language Resources and Evaluation Conference (LREC'18), pp.297-303, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01995257

A. Hazem and E. Morin, Efficient Data Selection for Bilingual Terminology Extraction from Comparable Corpora, Proceedings of the 26th International Conference on Computational Linguistics (COLING'16), pp.3401-3411, 2016.
URL : https://hal.archives-ouvertes.fr/hal-02001789

, Bilingual Word Embeddings for Bilingual Terminology Extraction from Specialized Comparable Corpora, Proceedings of the 8th International Joint Conference on Natural Language Processing (IJCNLP'17), pp.685-693, 2017.

D. He, Y. Xia, T. Qin, L. Wang, N. Yu et al., Dual Learning for Machine Translation, Advances in Neural Information Processing Systems 29 (NIPS'16), pp.820-828, 2016.

S. Hochreiter and J. Schmidhuber, Long Short-Term Memory, Neural Computation 9, vol.8, pp.899-7667, 1997.

D. H. Hubel, . Torsten-nils, and . Wiesel, Receptive fields and functional architecture of monkey striate cortex, The Journal of Physiology, vol.195, pp.215-243, 1968.

O. Irsoy and C. Cardie, Deep Recursive Neural Networks for Compositionality in Language, Advances in Neural Information Processing Systems 27 (NIPS'14), pp.2096-2104, 2014.

L. Jakubina and P. Langlais, Reranking Translation Candidates Produced by Several Bilingual Word Similarity Sources, Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL'17), pp.605-611, 2017.

M. Johnson, Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation, Transactions of the Association for Computational Linguistics 5, pp.339-351, 2017.

R. Jozefowicz, W. Zaremba, and I. Sutskever, An Empirical Exploration of Recurrent Network Architectures, Proceedings of the 32nd International Conference on International Conference on Machine Learning (ICML'15), vol.37, pp.2342-2350, 2015.

E. L. Keenan and L. M. Faltz, Boolean Semantics for Natural Language, 1985.

T. Kenter, A. Borisov, and . Maarten-de-rijke, Siamese CBOW: Optimizing Word Embeddings for Sentence Representations, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL'16), pp.941-951, 2016.

Y. Kim, Y. Jernite, D. Sontag, and A. M. Rush, Character-aware Neural Language Models, Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI'16), pp.2741-2749, 2016.

G. Koch, R. Zemel, and R. Salakhutdinov, Siamese Neural Networks for One-shot Image Recognition, Proceedings of the 32nd International Conference on Machine Learning (ICML'15), 2015.

L. Kotlerman, I. Dagan, I. Szpektor, and M. Zhitomirsky-geffet, Directional Distributional Similarity for Lexical Inference, Natural Language Engineering, vol.16, p.http, 2010.

. //dx,

P. R. Kroeger, Analyzing Grammar: An Introduction, p.9781139443517, 2005.

G. Lample and A. Conneau, Cross-lingual Language Model Pretraining, 2019.

G. Lample, A. Conneau, L. Denoyer, and M. Ranzato, Unsupervised Machine Translation Using Monolingual Corpora Only, Proceedings of the 6th International Conference on Learning Representations (ICLR'18), 2018.

A. Lazaridou, G. Dinu, and M. Baroni, Hubness and Pollution: Delving into Cross-Space Mapping for Zero-Shot Learning, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (ACL-IJCNLP'15), pp.270-280, 2015.

A. Lazaridou, M. Marelli, R. Zamparelli, and M. Baroni, Compositionally Derived Representations of Morphologically Complex Words in Distributional Semantics, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL'13), vol.1, pp.1517-1526, 2013.

P. Le and W. Zuidema, The Inside-Outside Recursive Neural Network model for Dependency Parsing, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP'14), pp.729-739, 2014.

R. Lebret and R. Collobert, Word Embeddings through Hellinger PCA, Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL'14), pp.482-490, 2014.

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE 86, vol.11, pp.2278-2324, 1998.

D. Lin, Automatic Retrieval and Clustering of Similar Words, Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (ACL-COLING '98), vol.2, pp.768-774, 1998.

W. Ling, Y. Tsvetkov, S. Amir, R. Fermandez, C. Dyer et al., Not All Contexts Are Created Equal: Better Word Representations with Variable Attention, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP'15), pp.1367-1372, 2015.

J. Liu, E. Morin, and S. Saldarriaga, Towards a unified framework for bilingual terminology extraction of single-word and multi-word terms, Proceedings of the 27th International Conference on Computational Linguistics (COL-ING'18), pp.2855-2866, 2018.
URL : https://hal.archives-ouvertes.fr/hal-02001236

M. Luong and C. D. Manning, Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models, Proceedings of the 54th, 2016.

, Annual Meeting of the Association for Computational Linguistics (ACL'16), vol.1, pp.1054-1063

M. Luong, H. Pham, and C. D. Manning, Bilingual Word Representations with Monolingual Quality in Mind, Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, pp.151-159, 2015.

, Effective Approaches to Attention-based Neural Machine Translation, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP'15), pp.1412-1421, 2015.

C. D. Manning, P. Raghavan, and H. Schütze, An Introduction to Information Retrieval, 2008.

K. Martin, N. Wiratunga, S. Sani, S. Massie, and J. Clos, A Convolutional Siamese Network for Developing Similarity Knowledge in the SelfBACK Dataset, Proceedings of the Case-Based Reasoning and Deep Learning Workshop (CBRDL'17), pp.85-94, 2017.

W. S. Mcculloch and W. Pitts, A logical calculus of the ideas immanent in nervous activity, The bulletin of Mathematical Biophysics 5, vol.4, pp.115-133, 1943.

T. Mikolov, K. Chen, G. Corrado, and J. Dean, Efficient Estimation of Word Representations in Vector Space, 2013.

T. Mikolov, Q. V. Le, and I. Sutskever, Exploiting Similarities among Languages for Machine Translation, 2013.

T. Mikolov, I. Sutskever, K. Chen, S. Greg, J. Corrado et al., Distributed Representations of Words and Phrases and their Compositionality, Advances Neural Information Processing Systems 26 (NIPS'13), pp.3111-3119, 2013.

A. Mnih and K. Kavukcuoglu, Learning Word Embeddings Efficiently with Noise-contrastive Estimation, Advances Neural Information Processing Systems 26 (NIPS'13), vol.2, pp.2265-2273, 2013.

A. Mogadala and A. Rettinger, Bilingual Word Embeddings from Parallel and Non-parallel Corpora for Cross-Language Text Classification, Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL'16), pp.692-702, 2016.

R. C. Moore and W. Lewis, Intelligent Selection of Language Model Training Data, Proceedings of the ACL 2010 Conference Short Papers, pp.220-224, 2010.

E. Morin and B. Daille, Revising the Compositional Method for Terminology Acquisition from Comparable Corpora, Proceedings of the 24rd International Conference on Computational Linguistics (COLING'12), pp.1797-1810, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00766844

Y. Niwa and Y. Nitta, Co-occurrence Vectors from Corpora vs, Proceedings of the 15th Conference on Computational Linguistics (COLING'94), pp.304-309, 1994.

R. Paulus, R. Socher, D. Christopher, and . Manning, Advances in Neural Information Processing Systems 27 (NIPS'14), pp.2888-2896, 2014.

J. Pennington, R. Socher, and C. Manning, GloVe: Global Vectors for Word Representation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP'14), pp.1532-1543, 2014.

M. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark et al., Deep Contextualized Word Representations, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL'18), pp.2227-2237, 2018.

S. Qiu, Q. Cui, J. Bian, B. Gao, and T. Liu, Co-learning of Word Representations and Morpheme Representations, Proceedings the 25th International Conference on Computational Linguistics: Technical Papers (COLING'14), pp.141-150, 2014.

A. Radford, J. Wu, R. Child, D. Luan, D. Amodei et al., Language Models are Unsupervised Multitask Learners, 2019.

R. Rapp, Automatic Identification of Word Translations from Unrelated English and German Corpora, Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL'99), pp.99-1067, 1999.

X. Robitaille, Y. Sasaki, M. Tonoike, S. Sato, and T. Utsuro, Compiling French-Japanese Terminologies from the Web, Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL'06), pp.225-232, 2006.

F. Rosenblatt, The Perceptron: A Probabilistic Model for Information Storage and Organization in The Brain, Psychological Review, vol.65, pp.386-408, 1958.

A. Saha, M. M. Khapra, S. Chandar, J. Rajendran, and K. Cho, A Correlational Encoder Decoder Architecture for Pivot Based Sequence Generation, Proceedings of the 26th International Conference on Computational Linguistics (COLING'16), pp.109-118, 2016.

N. Schluter, The Word Analogy Testing Caveat, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL'18), pp.242-246, 2018.

T. Schuster, O. Ram, R. Barzilay, and A. Globerson, Cross-Lingual Alignment of Contextual Word Embeddings, with Applications to Zero-shot Dependency Parsing, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol.1, pp.1599-1613, 2019.

R. Sennrich, B. Haddow, and A. Birch, Improving Neural Machine Translation Models with Monolingual Data, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL'16), pp.86-96, 2016.

, Neural Machine Translation of Rare Words with Subword Units, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL'16), 2016.

G. Berlin, , pp.1715-1725

C. E. Shannon and W. Weaver, The mathematical theory of communication, 1949.

Y. Shigeto, I. Suzuki, K. Hara, M. Shimbo, and Y. Matsumoto, Ridge Regression, Hubness, and Zero-Shot Learning, Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD'15), pp.135-151, 2015.

S. L. Smith, H. P. David, S. Turban, N. Y. Hamblin, and . Hammerla, Offline bilingual word vectors, orthogonal transformations and the inverted softmax, Proceedings of the 5th International Conference on Learning Representations (ICLR'17), 2017.

N. Sobin, Syntactic Analysis: The Basics, p.9781444390704, 2010.

R. Socher, J. Bauer, C. D. Manning, N. Andrew, and Y. , Parsing with Compositional Vector Grammars, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL'13), pp.455-465, 2013.

R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D. Manning et al., Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank, Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP'13), pp.1631-1642, 2013.

A. Søgaard, ?. Agi?, B. Héctor-martínez-alonso, B. Plank, A. Bohnet et al., Inverted indexing for cross-lingual NLP, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (ACL-IJCNLP'15), vol.1, pp.1713-1722, 2015.

I. Sutskever, O. Vinyals, V. Quoc, and . Le, Sequence to Sequence Learning with Neural Networks, Advances in Neural Information Processing Systems 27 (NIPS'14), pp.3104-3112, 2014.

Z. Szabó and . Gendler, Compositionality, The Stanford Encyclopedia of Philosophy, 2017.

T. Tanaka, Measuring the Similarity Between Compound Nouns in Different Languages Using Non-parallel Corpora, Proceedings of the 19th International Conference on Computational Linguistics (COLING'02), pp.1-7, 2002.

P. Turney, Mining the web for synonyms: PMI-IR versus LSA on TOEFL, Proceedings of the Twelfth European Conference on Machine Learning (ECML'01), pp.491-502, 2001.

P. D. Turney and P. Pantel, From Frequency to Meaning: Vector Space Models of Semantics, Journal of Artificial Intelligence Research, vol.37, pp.1076-9757, 2010.

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones et al., Advances in Neural Information Processing Systems 30 (NIPS'17), pp.5998-6008, 2017.

P. Vincent, H. Larochelle, Y. Bengio, and P. Manzagol, Extracting and Composing Robust Features with Denoising Autoencoders, Proceedings of the 25th International Conference on Machine Learning (ICML'08), pp.1096-1103, 2008.

I. Vulic and M. Moens, Bilingual Distributed Word Representations from Document-aligned Comparable Data, Journal of Artificial Intelligence Research, vol.55, pp.1076-9757, 2016.

L. Wang, Y. Li, and S. Lazebnik, Learning Deep Structure-Preserving Image-Text Embeddings, Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR'16), pp.5005-5013, 2016.

. Wang, D. F. Longyue, L. S. Wong, Y. Chao, J. Lu et al., A Systematic Comparison of Data Selection Criteria for SMT Domain Adaptation, The Scientific World Journal, p.10, 2014.

P. J. Werbos, Backpropagation through time: what it does and how to do it, Proceedings of the IEEE 78, vol.10, pp.1550-1560, 1990.

J. Wieting, M. Bansal, K. Gimpel, and K. Livescu, Charagram: Embedding Words and Sentences via Character n-grams, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP'16), pp.1504-1515, 2016.

R. J. Williams and D. Zipser, Backpropagation: Theory, architectures, and applications, chap. Gradient-Based Learning Algorithms for Recurrent Networks and Their Computational Complexity, pp.433-486, 1995.

J. Wu, X. Wang, and W. Y. Wang, Extract and Edit: An Alternative to Back-Translation for Unsupervised Neural Machine Translation, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL'19), pp.1173-1183, 2019.

C. Xing, D. Wang, C. Liu, and Y. Lin, Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation, Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL'15), pp.1006-1011, 2015.

K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville et al., Show, Attend and Tell: Neural Image Caption Generation with Visual Attention, Proceedings of the 32nd International Conference on Machine Learning (ICML'15), vol.37, pp.2048-2057, 2015.

Z. Yang, W. Chen, F. Wang, and B. Xu, Unsupervised Neural Machine Translation with Weight Sharing, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL'18), pp.46-55, 2018.

Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov et al., XLNet: Generalized Autoregressive Pretraining for Language Understanding, 2019.

S. Zagoruyko and N. Komodakis, Learning to Compare Image Patches via Convolutional Neural Networks, Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR'15), pp.885-894, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01246261

J. Zhang and C. Zong, Exploiting Source-side Monolingual Data in Neural Machine Translation, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP'16), pp.1535-1545, 2016.

Y. Zhang, D. Gaddy, R. Barzilay, and T. Jaakkola, Ten Pairs to Tag -Multilingual POS Tagging via Coarse Mapping between Embeddings, Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL'16), pp.1307-1317, 2016.