J. Bresnan, A. Asudeh, I. Toivonen, and S. Wechsler, Lexical-Functional Syntax. Blackwell Textbooks in Linguistics, 2015.

J. Laurent, G. Eres, C. Passerieux, G. Iakimova, and M. Hardy-baylé, On understanding idiomatic language: The salience hypothesis assessed by ERPs, Brain Research, vol.1068, issue.1, pp.151-160, 2006.
URL : https://hal.archives-ouvertes.fr/hal-01440644

B. References-ricardo-baeza-yates and . Ribeiro-neto, Modern Information Retrieval, 1999.

M. Baroni, S. Bernardini, A. Ferraresi, and E. Zanchetta, The WaCky Wide Web: A collection of very large linguistically processed Web-crawled corpora, Journal of Language Resources and Evaluation, vol.43, pp.209-226, 2009.

B. Bassetti, Effects of writing systems on second language awareness: Word awareness in English learners of Chinese as a foreign language, Second Language Writing Systems. Multilingual Matters, pp.335-356, 2005.

G. Booij, The Oxford Handbook of Construction Grammar, pp.255-273, 2013.

D. Cai and H. Zhao, Neural Word Segmentation Learning for Chinese, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp.409-420, 2016.

J. Colson, The IdiomSearch Experiment: Extracting Phraseology from a Probabilistic Network of Constructions, pp.16-28, 2017.

W. Croft, Radical Construction Grammar: Syntactic Theory in Typological Perspective, 2001.

W. Croft, The Oxford Handbook of Construction Grammar, pp.211-232, 2013.

, 2002. Word: A Cross-Linguistic Typology

R. Artstein and M. Poesio, Inter-coder agreement for Computational Linguistics, Computational Linguistics, vol.34, issue.4, pp.555-596, 2008.

T. Baldwin and S. Kim, Multiword expressions, Handbook of Natural Language Processing, pp.267-292, 2010.

J. Benesty, J. Chen, Y. Huang, and I. Cohen, Pearson correlation coefficient, Noise reduction in speech processing, pp.1-4, 2009.

P. Berkhin, A survey of clustering data mining techniques, Grouping multidimensional data, pp.25-71, 2006.

D. Chila-markopoulou, Modern Greek comparative constructions: A syntactic analysis of adjectival and adverbial comparatives, Greek), 1986.

A. Fazly and S. Stevenson, Distinguishing subtypes of multiword expressions using linguistically-motivated statistical measures, Proceedings of the Workshop on A Broader Perspective on Multiword Expressions, pp.9-16, 2007.

A. Fazly, P. Cook, and S. Stevenson, Unsupervised type and token identification of idiomatic expressions, Computational Linguistics, vol.35, issue.1, pp.61-103, 2009.

K. Geeraert, R. Harald-bayen-bayen, and J. Newman, Understanding idiomatic variation, Proceedings of the 13th Workshop on Multiword Expressions, pp.80-90, 2017.

P. Hanks, Similes and sets: The English preposition "like, Languages and Linguistics: Festschrift for Professor Fr. ?ermák. Philosophy Faculty of the Charles University, 2005.

M. Israel, J. R. Harding, and V. Tobin, On simile, Language, Culture and Mind, pp.123-135, 2004.

E. Amjadian, D. Inkpen, T. Paribakht, and F. Faez, Local-global vectors to improve unigram terminology extraction, Proceedings of the 5th International Workshop on Computational Terminology (Computerm), pp.2-11, 2016.

M. Arcan, M. Turchi, S. Tonelli, and P. Buitelaar, Enhancing statistical machine translation with bilingual terminology in a CAT environment, Proceedings of the 11th Biennial Conference of the Association for Machine Translation in the Americas (AMTA), pp.54-68, 2014.

R. Basili, M. T. Gianluca-de-rossi, and . Pazienza, Inducing terminology for lexical acquisition, Proceedings of the 2nd Conference on Empirical Methods in Natural Lanaguge Processing (EMNLP), 1997.

K. Aharodnik, A. Feldman, and J. Peng, Designing a Russian Idiom-Annotated Corpus, Proceedings of the Language Resources and Evaluation Conference (LREC), 2018.

M. Akbari, Strategies for translating idioms, Journal of Academic and Applied Studies (Special Issue on Applied Linguistics, vol.3, issue.8, pp.32-41, 2013.

F. Al-shargi, A. Kaplan, R. Eskander, N. Habash, and O. Rambow, Morphologically Annotated Corpora and Morphological Analyzers for Moroccan and Sanaani Yemeni Arabic, Proceedings of the Language Resources and Evaluation Conference (LREC), 2016.
URL : https://hal.archives-ouvertes.fr/hal-01349201

L. Bäckström, B. Lyngfelt, and E. Sköldberg, Towards interlingual constructicography: On correspondence between constructicon resources for english and swedish, Constructions and Frames, vol.6, issue.1, pp.9-33, 2014.

M. Baker, Other Words: A Coursebook on Translation. Routledge, 1992.

L. Banarescu, C. Bonial, S. Cai, M. Georgescu, K. Griffitt et al., Abstract Meaning Representation for Sembanking, Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, pp.178-186, 2013.

F. Bond and R. Foster, Linking and extending an open multilingual wordnet, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, vol.1, pp.1352-1362, 2013.

C. Bonial, B. Badarau, K. Griffitt, U. Hermjakob, K. Knight et al., Abstract Meaning Representation of Constructions: The More We Include, the Better the Representation, Proceedings of the 2018 Language Resources and Evaluation Conference (LREC), 2018.

H. Bouamor, N. Habash, and K. Oflazer, The Multidialectal Parallel Corpus of Arabic, Proceedings of the Language Resources and Evaluation Conference (LREC), 2014.

H. Bouamor, N. Habash, M. Salameh, W. Zaghouani, O. Rambow et al., The MADAR Arabic Dialect Corpus and Lexicon, Proceedings of the Language Resources and Evaluation Conference (LREC), 2018.

E. Kristin and . Brustad, The Syntax of Spoken Arabic. Georgetown University Press. 13 We expect to release these resources to the research community, 2002.

W. Croft and D. Cruse, Cognitive Linguistics, 2004.

, Diacritization of Moroccan and Tunisian Arabic Dialects: A CRF Approach, Proceedings of the 3rd Workshop on OpenSource Arabic Corpora and Processing Tools, 2018.

B. Dorr and C. R. Voss, The Case for Systematically Derived Spatial Language Usage, Proceedings of the NAACL 2018 Workshop on Spatial Language Understanding (SpLU), 2018.

A. E. Haloui and S. L. Bowman, Moroccan Arabic Verb Dictionary, 2011.

A. Fazly, P. Cook, and S. Stevenson, Unsupervised Type and Token Identification of Idiomatic Expressions, Computational Linguistics, pp.61-103, 2009.

, WordNet: An Electronic Lexical Database, 1998.

C. Fernando, Idioms and Idiomaticity, p.5, 1996.

C. Fillmore, P. Kay, M. Catherine, and O. Connor, Regularity and idiomaticity in grammatical constructions: The case of let alone. Language, pp.501-538, 1988.

C. Fillmore, R. Lee-goldman, and R. Rhodes, The Framenet Constructicon. Sign-based construction grammar, pp.309-372, 2012.

C. Fillmore, Border conflicts: Framenet meets construction grammar, Proceedings of the XIII EURALEX international congress, vol.4968, 2008.

M. Forsberg, R. Johansson, L. Bäckström, L. Borin, B. Lyngfelt et al., From construction candidates to constructicon entries: An experiment using semi-automatic methods for identifying constructions in corpora, Constructions and Frames, vol.6, issue.1, pp.114-135, 2014.

A. Goldberg, Constructions: A construction grammar approach to argument structure, 1995.

A. Goldberg, Constructions: a new theoretical approach to language, TRENDS in Cognitive Sciences, vol.7, 2003.

, A Dictionary of Moroccan Arabic: Moroccan-English English-Moroccan, 1966.

C. Hashimoto and D. Kawahara, Construction of an idiom coprus and its application to idiom identification based on wsd incorporating idiom-specific features, Proceedings of the Empirical Methods for Natural Language Processing Conference (EMNLP), 2008.

K. Mrini and F. Bond, Building the Moroccan Darija Wordnet (MDW) using Bilingual Resources, Proceedings of the International Conference on Natural Language, Signal and Speech Processing, 2017.

G. Muzny and L. Zettlemoyer, Automatic Idiom Identification in Wiktionary, Proceedings of the Empirical Methods for Natural Language Processing Conference (EMNLP), 2013.

G. Nunberg, I. A. Sag, and T. Wasow, Idioms. Language, vol.70, issue.3, pp.491-538, 1994.

K. Ohara, Toward constructicon building for japanese in japanese framenet, Revista Veredas, issue.1, p.17, 2016.

J. Peng and A. Feldman, Automatic Idiom Recognition with Word Embeddings, Proceedings of the Annual International Symposium on Information Management and Big Data, 2017.

R. M. Priyanka and . Sinha, A System for Identification of Idioms in Hindi, Seventh International Conference on Contemporary Computing (IC3), 2014.

A. Ivan, T. Sag, F. Baldwin, A. Bond, D. Copestake et al., Multiword Expressions: A Pain in the Neck for NLP?, Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing (CICLing'02), 2002.

Y. Samih and W. Maier, An Arabic-Moroccan Darija Code-Switched Corpus, Proceedings of the Language Resources and Evaluation Conference (LREC), 2016.

Y. Samih and W. Maier, Detecting Code-Switching in Moroccan Arabic Social Media, SocialNLP Workshop at International Joint Conference on Artificial Intelligence (IJCAI), 2016.

A. Shojaei, Translation of Idioms and Fixed Expressions: Strategies and Difficulties. Theory and Practice in Language Studies, vol.2, 2012.

F. Gary, C. D. Simons, and . Fennig, Ethnologue: Languages of the World, Twenty-first edition, 2018.

T. Takezawa, G. Kikui, M. Mizushima, and E. Sumita, Multilingual Spoken Language Corpus Development for Communication Research, International Journal of Computational Linguistics & Chinese Language Processing: Special Issue, vol.12, issue.3, pp.303-324, 2007.

L. Talmy, Lexicalization patterns: Semantic structure in lexical forms. Language typology and syntactic description, vol.3, pp.36-149, 1985.

L. Talmy, Toward a cognitive semantics, vol.2, 2000.

L. Tiago-timponi-torrent, T. Meireles-lage, T. Fernandes-sampaio, . Da-silva, E. Tavares et al., Revisiting border conflicts between FrameNet and Construction Grammar: Annotation policies for the Brazilian Portuguese Constructicon, Constructions and Frames, vol.6, issue.1, pp.34-51, 2014.

S. Vietri, The Lexicon-Grammar of Italian Idioms, Proceedings of the International Conference on Computational Linguistics (COLING), 2014.
DOI : 10.3115/v1/w14-5817

URL : https://hal.archives-ouvertes.fr/hal-01414494

R. Clare, S. Voss, J. Tratz, D. Laoudi, and . Briesch, Finding Romanized Arabic Dialect in Code-Mixed Tweets, Proceedings of the Language Resources and Evaluation Conference (LREC), 2014.

M. Ali-yaghan, Arabizi": A Contemporary Style of Arabic Slang, Design Issues, vol.24, issue.2, pp.39-52, 2008.

R. Zbib, E. Malchiodi, J. Devlin, D. Stallard, S. Matsoukas et al., Machine Translation of Arabic Dialects, Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT '12, pp.49-59, 2012.

M. J. Aranzabe, A. Atutxa, and K. Bengoetxea, Automatic conversion of the Basque dependency treebank to universal dependencies, Proceedings of the Workshop on Treebanks and Linguistic Theories, pp.233-241, 2015.

X. Iñaki-alegria, K. Artola, M. Sarasola, and . Urkia, Automatic morphological analysis of Basque, Literary and Linguistic Computing, vol.11, pp.193-203, 1996.

O. Iñaki-alegria, X. Ansa, N. Artola, K. Ezeiza, R. Gojenola et al., Representation and treatment of Multiword Expressions in Basque, Proceedings of the Workshop on Multiword Expressions: Integrating Processing, pp.48-55, 2004.

G. C. Pastor, Manual de fraseología española, 1997.

R. E. , Valency and argument structure in the Basque verb, 2003.

A. Gurrutxaga and I. Alegria, Automatic extraction of NV expressions in Basque: basic issues on cooccurrence techniques, Proceedings of the Workshop on Multiword Expressions: from parsing and generation to the real world, pp.2-7, 2011.

A. Gurrutxaga and I. Alegria, Combining different features of idiomaticity for the automatic classification of noun+verb expressions in Basque, Proceedings of the 9th Workshop on Multiword Expressions, pp.116-125, 2013.

U. Inurrieta and I. Aduriz, Rule-based translation of Spanish Verb-Noun combinations into Basque, Proceedings of the 13th Workshop on Multiword Expressions, pp.149-154, 2017.

U. Inurrieta and I. Aduriz, Analysing linguistic information about word combinations for a Spanish-Basque rule-based machine translation system, Multiword Units in Machine Translation and Translation Technologies, pp.39-60

K. Mitxelena, Orotariko Euskal Hiztegia. Euskaltzaindia, the Royal Academy of the Basque language, 1987.

. Itziar-laka-mugarza, A brief grammar of Euskera, the Basque language, 1996.

G. Smørdal-losnegaard, F. Sangati, C. Parra-escartín, A. Savary, S. Bargmann et al., PARSEME survey on MWE resources, 9th International Conference on Language Resources and Evaluation (LREC 2016, pp.2299-2306, 2016.

T. Ivan-a-sag, F. Baldwin, A. Bond, D. Copestake, and . Flickinger, Multiword expressions: a pain in the neck for NLP, International Conference on Intelligent Text Processing and Computational Linguistics, pp.1-15, 2002.

A. Savary, M. Sailer, Y. Parmentier, M. Rosner, V. Rosén et al., PARSEMEPARSing and Multiword Expressions within a European multilingual network, 7th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, 2015.

A. Savary, C. Ramisch, S. Cordeiro, F. Sangati, V. Vincze et al., Fabienne Cap, Voula Giouli, Ivelina Stoyanova and others. 2017. The PARSEME Shared Task on automatic identification of Verbal Multiword Expressions, Proceedings of the 13th Workshop on Multiword Expressions, pp.31-47, 2017.

A. Savary, C. Ramisch, and S. Cordeiro, Edition 1.1 of the PARSEME Shared Task on automatic identification of Verbal Multiword Expressions, Proceedings of the 14th Workshop on Multiword Expressions, 2018.
URL : https://hal.archives-ouvertes.fr/hal-02152557

R. Urizar, Euskal lokuzioen tratamendu konputazionala, 2012.

U. Igone-zabala, Los predicados complejos en vasco, Las fronteras de la composicin en lenguas romnicas y en vasco, pp.445-534, 2004.

D. Abusch, On verbs and time. Doctoral dissertation, 1985.

J. F. Allen, M. Swift, and W. Beaumont, Deep semantic analysis of text, Proc. of the 2008 Conference on Semantics in Text Processing, STEP '08, pp.343-354, 2008.

J. F. Allen, Towards a general theory of action and time, Artificial Intelligence, vol.23, pp.123-54, 1984.

L. B. Anderson, The 'perfect' as a universal and as a language-specific category, Tense-aspect: Between semantics and pragmatics, pp.227-264, 1982.

E. Bach, The algebra of events, Linguistics and philosophy, vol.9, issue.1, pp.5-16, 1986.

L. Banarescu, C. Bonial, S. Cai, M. Georgescu, K. Griffitt et al., Abstract Meaning Representation for sembanking, Proc. of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, pp.178-186, 2013.

W. Breu, Studies in Language. International Journal sponsored by the Foundation "Foundations of Language, vol.18, pp.23-44, 1994.

H. Bunt and J. Pustejovsky, Annotating temporal and event quantification, Proc. of 5th ISA Workshop, 2010.

N. Chang, D. Gildea, and S. Narayanan, A dynamic model of aspectual composition, Proc. of CogSci, pp.226-231, 1998.

N. Chang, A motor-and image-schematic analysis of aspectual composition, 1997.

B. Comrie, Aspect, 1976.

B. Comrie, Tense, 1985.

W. Croft, Verbs: Aspect and Causal Structure, 2012.

R. David and . Dowty, The effects of aspectual class on the temporal structure of discourse: semantics or pragmatics?, Linguistics and Philosophy, vol.9, issue.1, pp.37-61, 1986.

A. Friedrich and A. Palmer, Automatic prediction of aspectual class of verbs in context, Proc. of ACL, 2014.

A. Friedrich, A. Palmer, and M. Pinkal, Situation entity types: Automatic classification of clause-level aspect, Proc. of ACL, pp.1757-1768, 2016.

E. Hinrichs, Temporal anaphora in discourses of English, Linguistics and philosophy, vol.9, issue.1, pp.63-82, 1986.

W. Klein, Time in language, 1994.

R. W. Langacker, Remarks on English aspect, Tense-aspect: Between semantics and pragmatics, pp.265-304, 1982.

. Ronald-w-langacker, Foundations of cognitive grammar: Theoretical prerequisites, vol.1, 1987.

B. Levin, English verb classes and alternations: A preliminary investigation, 1993.

B. Li, Y. Wen, L. Bu, W. Qu, and N. Xue, Annotating The Little Prince with Chinese AMRs, Proc. of LAW X-the 10th Linguistic Annotation Workshop, pp.7-15, 2016.

A. Thomas, E. G. Mathew, and . Katz, Supervised categorization for habitual versus episodic sentences, Sixth Midwest Computational Linguistics Colloquium, 2009.

C. Matthiessen and J. A. Bateman, Text generation and systemic-functional linguistics: Experiences from english and japanese, 1991.

N. Migueles-abraira, R. Agerri, A. Diaz-de-ilarraza-;-khalid, C. Choukri, T. Cieri et al., Annotating Abstract Meaning Representations for Spanish, Stelios Piperidis, and Takenobu Tokunaga, pp.3074-3078, 2018.

M. Moens and M. Steedman, Temporal ontology and temporal reference, Computational Linguistics, vol.14, issue.2, pp.15-28, 1988.

N. Mostafazadeh, A. Grealish, N. Chambers, J. Allen, and L. Vanderwende, CaTeRS: Causal and temporal relation scheme for semantic annotation of event structures, Proc. of the Fourth Workshop on Events, pp.51-61, 2016.

K. Tim-o'gorman, M. Wright-bettner, and . Palmer, Richer Event Description: Integrating event coreference with temporal, causal and bridging annotation, Proc. of the 2nd Workshop on Computing News Storylines, pp.47-56, 2016.

M. Tim-o'gorman, K. Regan, U. Griffitt, K. Hermjakob, M. Knight et al., AMR beyond the sentence: the Multi-sentence AMR corpus, Proc. of COLING, 2018.

M. Palmer, D. Gildea, and P. Kingsbury, The Proposition Bank: An annotated corpus of semantic roles, Computational Linguistics, vol.31, issue.1, pp.71-106, 2005.

P. Frank-robert, Mood and modality, 2001.

. Barbara-h-partee, Nominal and temporal semantic structure: Aspect and quantification, vol.3, p.91, 1999.

P. Portner, The progressive in modal semantics, Language, vol.74, issue.4, p.760, 1998.

J. Pustejovsky, J. M. Castaño, R. Ingria, R. Saurí, R. J. Gaizauskas et al., TimeML: Robust specification of event and temporal expressions in text, IWCS-5, Fifth International Workshop on Computational Semantics, 2003.

J. Pustejovsky, H. Bunt, and A. Zaenen, Designing annotation schemes: From theory to model, Handbook of Linguistic Annotation, pp.21-72, 2017.

J. Pustejovsky, ISO-Space: Annotating static and dynamic spatial information, Handbook of Linguistic Annotation, pp.989-1024, 2017.

R. Reichart and A. Rappoport, Tense sense disambiguation: A new syntactic polysemy task, Proc. of the 2010 Conference on Empirical Methods in Natural Language Processing, pp.325-334, 2010.

H. Reichenbach, Elements of symbolic logic, 1947.

S. Rothstein, Telicity, atomicity and the Vendler classification of verbs. Theoretical and Crosslinguistic Approaches to Aspect, pp.43-77, 2008.

C. S. Smith, Time with and without tense, Time and Modality, Studies in Natural Language and Linguistic Theory, pp.227-249, 2008.

Z. Vendler, Verbs and times, The Philosophical Review, vol.66, pp.143-60, 1957.

N. Xue, O. Bojar, J. Haji?, M. Palmer, Z. Ure?ová et al., Not an interlingua, but close: comparison of English AMRs to Chinese and Czech, Proc. of LREC, pp.1765-1772, 2014.

S. Bailey and D. Meurers, Diagnosing Meaning Errors in Short Answers to Reading Comprehension Questions, Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications, pp.107-115, 2008.

N. Beuck, A. Köhn, and W. Menzel, Predictive incremental parsing and its evaluation, Computational Dependency Theory, vol.258, pp.186-206, 2013.

A. Boyd, .. ;. , K. Choukri, B. Maegaard, and J. Mariani, EAGLE: an Error-Annotated Corpus of Beginning Learner German, Proceedings of the International Conference on Language Resources and Evaluation, 2010.

C. Bryant, M. Felice, and T. Briscoe, Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, vol.1, pp.793-805, 2017.

J. Cohen, A coefficient of agreement for nominal scales, Educational and Psychological Measurement, vol.20, issue.1, pp.37-46, 1960.

M. Da and L. V. Murta, Vater und Sohn" im Anfängerunterricht: Eine Hörverstehensübung und ein Schreibauftrag. Fremdsprache Deutsch: Zeitschrift für die Praxis des Deutschunterrichts, vol.5, pp.46-47, 1991.

D. Dahlmeier, H. T. Ng, and S. Wu, Building a Large Annotated Corpus of Learner English: The NUS Corpus of Learner English, Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications, pp.22-31, 2013.

M. Dickinson and M. Ragheb, Dependency Annotation for Learner Corpora, Proceedings of the Eighth Workshop on Treebanks and Linguistic Theories (TLT-8), pp.59-70, 2009.

A. Díaz-negrillo, D. Meurers, S. Valera, and H. Wunsch, Towards interlanguage POS annotation for effective learner corpora in SLA and FLT, Special Issue on Corpus Linguistics for Teaching and Learning, vol.36, pp.139-154, 2010.

F. Eppert, Deutsch mit Vater und Sohn: 10 Bildgeschichten von E. O. Plauen für den Unterricht Deutsch als Fremdsprache, 2001.

E. Fitzpatrick and M. S. Seegmiller, The Montclair Electronic Language Database Project, Language and Computers, Applied Corpus Linguistics. A Multidimensional Perspective, 2004.

A. Kilian, A. Foth, N. Köhn, W. Beuck, ;. Menzel et al., Because Size Does Matter: The Hamburg Dependency Treebank, Proceedings of the Language Resources and Evaluation Conference, 2014.

A. Kilian and . Foth, Eine umfassende Constraint-Dependenz-Grammatik des Deutschen. Fachbereich Informatik, Hamburg. URN: urn:nbn:de:gbv, pp.18-228, 2006.

S. Granger, E. Dagneaux, F. Meunier, and M. Paquot, International Corpus of Learner English. Version 2. Handbook and CD-Rom. Presses universitaires de Louvain, 2009.

J. Anderson, M. Kogan, L. Palen, K. Anderson, K. Stowe et al., Far far away in far rockaway: Responses to risks and impacts during hurricane sandy through first-person social media narratives, Proceedings of the Information Systems for Crisis Response and Management (ISCRAM) Conference, 2016.

J. Barnes, R. Klinger, and S. Schulte-im-walde, Assessing state-of-the-art sentiment models on state-of-the-art sentiment datasets, Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp.2-12, 2017.

J. Brotzge and W. Donner, The tornado warning process: A review of current research, challenges, and opportunities, Bulletin of the American Meteorological Society, vol.94, issue.11, pp.1715-1733, 2013.

J. Demuth, R. Morss, L. Palen, K. Anderson, J. Anderson et al., sometimes da #beachlife ain't always da wave": Understanding people's evolving risk assessments and responses during hurricane sandy using twitter, 2018.

C. Fiesler and N. Proferes, Participant" perceptions of twitter research ethics, Social Media + Society, vol.4, issue.1, 2018.

M. Finn and K. Crawford, The limits of crisis data: analytical and ethical challenges of using social and mobile data to understand disasters, GeoJournal, vol.80, issue.4, pp.491-502, 2015.

N. Dash and H. Gladwin, Evacuation decision making and behavioral responses: Individual and household, Natural Hazards Review, vol.8, issue.3, pp.69-77, 2007.

M. Kogan and L. Palen, Conversations in the eye of the storm: At-scale features of conversational structure in a high-tempo, high-stakes microblogging environment, Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, CHI '18, vol.84, pp.1-84, 2018.

H. Lazrus, O. Wilhelmi, J. Henderson, R. E. Morss, and A. Dietrich, Information as intervention: How can hurricane risk communication reduce vulnerability?, 2017.

K. Michael, R. W. Lindell, and . Perry, The protective action decision model: Theoretical modifications and additional evidence, Risk Analysis, vol.32, issue.4, pp.616-632, 2012.

D. S. Mileti and J. H. Sorenson, Communication of Emergency Public Warnings: A Social Science Perspective and State-of-the-ART Assessment. Oak Ridge National Laboratory Rep, 1990.

R. E. Morss, J. L. Demuth, H. Lazrus, C. M. Leysia-palen, C. A. Barton et al., Hazardous weather prediction and communication in the modern information environment, Bulletin of the American Meteorological Society, vol.98, issue.12, pp.2653-2674, 2017.

L. Palen and K. M. Anderson, Crisis informatics: New data for extraordinary times, Science, vol.353, issue.6296, pp.224-225, 2016.

I. Ruin, C. Lutoff, B. Boudevillain, J. Creutin, S. Anquetin et al., Social and hydrological responses to extreme precipitations: An interdisciplinary strategy for postflood investigation, vol.6, pp.135-153, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00921452

S. Verma, S. Vieweg, W. Corvey, L. Palen, J. Martin et al., , 2011.

A. Roberts, R. Gaizauskas, M. Hepple, G. Demetriou, Y. Guo et al., Building a semantically annotated corpus of clinical texts, Journal of Biomedical Informatics, vol.42, issue.5, pp.950-966, 2009.

A. Bies, M. Ferguson, K. Katz, and R. Macintyre, Bracketing guidelines for Treebank II Style Penn Treebank project, 1995.

C. D. Manning, . Surdeanu, . Mihai, . Bauer, . John et al., The Stanford CoreNLP Natural Language Processing Toolkit. Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp.55-60, 2014.

D. Albright, A. Lanfranchi, A. Fredriksen, W. F. Styler, I. V. et al., Towards comprehensive syntactic and semantic annotations of the clinical narrative, Journal of the American Medical Informatics Association, vol.20, issue.5, pp.922-930, 2013.

D. Mcclosky, Self-trained biomedical parsing, 2009.

G. Project, , 2018.

F. Xia and M. Yetisgen-yildiz, Clinical Corpus Annotation: Challenges and Strategies, Proceedings of the Third Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM'2012) under LREC-2012, 2012.

H. Dalianis, M. Hassel, A. Henriksson, and M. Skeppstedt, Stockholm EPR Corpus: A Clinical Database Used to Improve Health Care, Proceedings of Swedish Language Technology Conference, pp.17-18, 2012.

H. Jeffrey-p-ferraro, I. Daume, . Scott-l-du, W. W. Vall, H. Chapman et al., Improving Performance of Natural Language Processing Part-of-Speech Tagging on Clinical Narratives through Domain Adaptation, Journal of the American Medical Informatics Association, vol.20, pp.931-939, 2013.

J. R. Finkel and C. D. Manning, Joint Parsing and Named Entity Recognition, Proceedings of Human Language Technology: 2009 Conference of the North American Chapter of the Association of Computational Linguistics, pp.326-334, 2009.

J. Kim, T. Ohta, Y. Tateisi, and J. Tsujii, GENIA corpus-A semantically annotated corpus for bio-text mining, Bioinformatics, vol.19, issue.1, pp.180-182, 2003.

J. Kim, T. Ohta, Y. Teteisi, and J. Tsujii, GENIA Corpus Manual-Encoding schemes for the corpus and annotation, 2006.

J. Fan, E. W. Yang, M. Jiang, R. Prasad, R. M. Loomis et al., Syntactic parsing of clinical text: guideline and corpus development with handling ill-formed sentences, Journal of the American Medical Informatics Association, vol.20, issue.6, pp.1168-1177, 2013.

K. Bretonnel-cohen, P. V. Ogren, L. Fox, and L. Hunter, Corpus design for biomedical natural language processing, Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics, pp.38-45, 2005.

M. Jiang, Y. Huang, J. Fan, B. Tang, J. C. Denny et al., Parsing clinical text: how good are the state-of-the-art parsers?, BMC Medical Informatics and Decision Making, vol.15, issue.1, p.2, 2015.

M. P. Marcus, B. Santorin-i, and M. A. Marcinkiewicz, Building a Large Annotated Corpus of English: The Penn Treebank, Computational Linguistics, vol.19, pp.313-330, 1993.

N. Choudhary, P. Pathak, and P. Patel, Annotating a Large Representative Corpus of Clinical Notes for Parts of Speech, Proceedings of 8th Linguistic Annotation Workshop, pp.87-92, 2014.

N. Chomsky, Aspects of the theory of syntax, 1965.

N. Chomsky, Lectures on government and binding: the Pisa lectures, 1981.

N. Chomsky, The minimalist program, 1995.

N. Alnazzawi, P. Thompson, and S. Ananiadou, Building a semantically annotated corpus for congestive heart and renal failure from clinical records and the literature, Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis, pp.69-74, 2014.

P. Pathak, P. Patel, V. Panchal, S. Soni, K. Dani et al., ezDI: A supervised NLP system for clinical narrative analysis, Proceedings of the 9th International Workshop on Semantic Evaluation, pp.412-416, 2015.

P. Harrison, S. Abney, E. Black, D. Flickinger, C. Gdaniec et al., Evaluating syntax performance of parser/grammars, Proceedings of the Natural Language Processing Systems Evaluation Workshop, 1991.

P. V. Orgen, K. Guergana, C. G. Savova, and . Chute, Constructing Evaluation Corpora for Automated Clinical Named Entity Recognition, Proceedings of the 12th World Congress on Health, pp.3143-3150, 2007.

P. Zweigenbauma, P. Jacquemarta, N. Grabara, and B. Habert, Building a Text Corpus for Representing the Variety of Medical Language, Studies in health technology and informatics, vol.84, issue.1, pp.290-294, 2001.

V. Serguei, A. Pakhomov, C. G. Coden, and . Chute, Developing a corpus of clinical notes manually annotated for part-of-speech, International Journal of Medical Informatics, vol.75, issue.6, pp.418-429, 2006.

S. Petrov, L. Barrett, R. Thibaux, and D. Klein, Learning accurate, compact, and interpretable tree annotation, Proceedings of the 21st International conference on computational linguistics and the 44th annual meeting of the Association for Computational Linguistics, pp.443-440, 2006.
DOI : 10.3115/1220175.1220230

URL : http://dl.acm.org/ft_gateway.cfm?id=1220230&type=pdf

T. Stephen, H. Wu, D. Liu, C. Li, M. A. Tao et al., Unified Medical Language System term occurrences in clinical notes: a large-scale corpus analysis, Journal of the American Medical Informatics Association, vol.19, issue.e1, pp.149-156, 2012.

T. Hao, A. Rusanov, M. R. Boland, and C. Weng, Clustering clinical trials with similar eligibility criteria features, Journal of Biomedical Informatics, vol.52, pp.112-120, 2014.
DOI : 10.1016/j.jbi.2014.01.009

URL : https://doi.org/10.1016/j.jbi.2014.01.009

V. Vinczer, G. Szarvas, R. Farkas, G. Móra, and J. Csirik, The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes, Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing, pp.38-45, 2008.

W. W. Chapman, G. K. Savova, J. Zheng, M. Tharp, and R. Crowley, Anaphoric reference in clinical reports: Characteristics of an annotated corpus, Journal of Biomedical Informatics, vol.45, issue.3, pp.507-521, 2012.
DOI : 10.1016/j.jbi.2012.01.010

URL : https://doi.org/10.1016/j.jbi.2012.01.010

F. William, I. V. Styler, S. Bethard, S. Finan, M. Palmer et al., Temporal Annotation in the Clinical Domain, Transactions of the Association for Computational Linguistics, vol.2, pp.143-154, 2012.

Y. Wang, Annotating and recognising named entities in clinical notes, Proceedings of the ACL-IJCNLP 2009 Student Research Workshop, pp.18-26, 2009.
DOI : 10.3115/1667884.1667888

URL : http://dl.acm.org/ft_gateway.cfm?id=1667888&type=pdf

Y. Tateisi, A. Yakushiji, T. Ohta, and J. Tsujii, Syntax annotation for the GENIA corpus, Companion Volume to the Proceedings of Second international joint conference on natural language processing, pp.220-225, 2005.

Y. Tateisi and J. Tsujii, Part-of-Speech Annotation of Biology Research Abstracts, Proceedings of 4th International Conference on Language Resources and Evaluation (LREC, pp.1267-1270, 2004.

Y. Oda, G. Neubig, S. Sakti, T. Toda, and S. Nakamura, Ckylark: A more robust PCFG-LA parser, Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics, pp.41-45, 2015.
DOI : 10.3115/v1/n15-3009

URL : https://doi.org/10.3115/v1/n15-3009

Z. Shi, A. Sarkar, and F. Popowich, Simultaneous Identification of Biomedical Named-Entity and Functional Relations Using Statistical Parsing Techniques, Proceedings of Human Language Technology: 2007 Conference of the North American Chapter of the Association of Computational Linguistics, pp.161-164, 2007.
DOI : 10.3115/1614108.1614149

URL : http://dl.acm.org/ft_gateway.cfm?id=1614149&type=pdf

X. References-cao-shuyuan, . Nianwen, I. Da-cunha-iria, W. Mikel, and . Chuan, Discourse Segmentation for Building a RST Chinese Treebank, Proceedings of the 6th Workshop Recent Advances in RST and Related Formalisms, pp.73-81, 2017.

C. Shuyuan, I. Da-cunha-iria, and . Mikel, A Corpus-based Approach for Spanish-Chinese Language Learning, Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (NLP-TEA3), pp.97-106, 2016.

C. Shuyuan, B. Da-cunha-iria, and . Nuria, An analysis of the Concession relation based on the Spanish discourse marker aunque in a Spanish-Chinese parallel corpus, Procesamiento del Lenguaje Natural, vol.56, pp.81-88, 2016.

C. Shuyuan, I. Da-cunha-iria, and . Mikel, Toward the Elaboration of a Spanish-Chinese Parallel Annotated Corpus, EPiC Series of Language and Linguistics, vol.2, pp.315-324, 2017.

C. Shuyuan and G. Harritxu, Using Discourse Information for Education with a Spanish-Chinese Parallel Corpus, Proceedings of the 11th edition of the Language Resources and Evaluation Conference (LREC'2018, pp.2254-2261, 2018.

C. Lynn, M. Daniel, and O. Ellen, Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory, Proceedings of the 2nd SIGDIAL Workshop on Discourse Dialogue, pp.1-10, 2001.

C. Songren, Comparing Structures of Essays in Chinese and English, 1985.

. Da-cunha-iria, A Symbolic Corpus-based Approach to Detect and Solve the Ambiguity of Discourse Markers, Research in Computing Science, vol.70, pp.95-106, 2013.

I. Da-cunha-iria and . Mikel, Comparing rhetorical structures of different languages: The influence of translation strategies, Discourse Studies, vol.12, issue.5, pp.563-598, 2010.

S. Da-cunha-iria, T. Eric, L. Juan-manuel, C. Marina, and . Irene, DiSeg 1.0: The First System for Spanish Discourse Segmentation, Expert Systems with Applications (ESWA), vol.39, issue.2, pp.1671-1678, 2012.

T. Da-cunha-iria, . Juan-manuel, and G. Sierra, On the Development of the RST Spanish Treebank, Proceedings of the 5th Linguistic Annotation Workshop, pp.1-10, 2011.

T. Da-cunha-iria, S. Juan-manuel, and . Gerardo,

C. Adrián,

B. Castro-rolón, ;. Gabriela, and R. Miguel, The RST Spanish Treebank On-line Interface, Proceedings of Recent Advances in Natural Language Processing, pp.698-703, 2011.

E. Judith, K. Roland, and G. Iryna, On the Role of Discourse Markers for Discriming Claims and Premises in Argumentative Discourse, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp.2236-2242, 2015.

G. Ramsay, Linearity in Rhetorical Organisation: A Comparative Cross-cultural Analysis of Newstext from the People's Republic of China and Australia, International Journal of Applied Linguistics, vol.10, issue.2, pp.241-58, 2000.

G. Ramsay, What Are They Getting At? Placement of Important Ideas in Chinese Newstext: A Contrastive Analysis with Australian Newstext, Australian Review of Applied Linguistics, vol.24, issue.2, pp.17-34, 2001.

H. Eduard and L. Julia, Toward a 'Science' of Corpus Annotation: A New Methodology Challenges for Corpus Linguistics, International Journal of Translation, vol.22, issue.1, pp.13-36, 2010.

I. Oier and I. Mikel, Deliberation as Genre: Mapping Argumentation through Relational Discourse Structure, Proceedings of the 6th Workshop Recent Advances and Related Formalisms, pp.1-10, 2017.

I. Mikel, A. M. Jesús, . Diaz-de-ilarraza, G. Arantza, L. Itziar et al., The RST Basque TreeBank: an online search interface to check rhetorical relations, Proceedings of IV Workshop A RST e os Estudos do Texto, pp.40-49, 2013.

I. Mikel, . Díaz-de-ilarraza, L. Arantza, and . Mikel, The annotation of the Central Unit in Rhetorical Structure Trees: A Key Step in Annotating Rhetorical Relations, Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp.466-475, 2014.

I. Mikel, T. Da-cunha-iria, and . Maite, A Qualitative Comparison Method for Rhetorical Structures: Identifying different discourse structures in multilingual corpora. Language resources and evaluation, vol.49, pp.263-309, 2015.

I. Mikel, L. Gorka, and D. Juliano, Detecting the central units in two different genres and languages: a preliminary study of Brazilian Portuguese and Basque texts, Procesamiento de Lenguaje Natural, vol.56, pp.65-72, 2016.

L. Yancui, F. Wenhe, and Z. Guodong, Elementary Discourse Unit in Chinese Dsicourse Structure Analysis, Chinese Lexical Semantics, vol.7717, pp.186-198, 2012.

M. William, C. , T. Sandra, and A. , Rhetorical Structure Theory: Toward a functional theory of text organization, Text&Talk, vol.8, issue.3, pp.243-281, 1988.

M. Daniel, The rhetorical parsing of unrestricted texts: A surface-based approach, Computational Linguistics, vol.26, issue.3, pp.395-448, 2000.

O. Michael, RSTTool 2.4-A Markup Tool For Rhetorical Structure Theory, Proceedings of First International Conference on Natural Language Generation (INLG'2000), pp.253-256, 2000.

A. Pardo-thiago and . Salgueiro, Software vai melhorar compreensão de textos em computadores, 2005.

A. Pardo-thiago, N. Salgueiro, V. Maria-maria-das-graças, R. Lucia, and H. M. , Dizer: An Automatic Discourse Analyzer for Brazilian Portuguese, Lecture Notes in Artificial Intelligence, vol.3171, pp.224-234, 2008.

A. Pardo-thiago, S. Salgueiro, and R. M. Eloize, Rhetalho: um corpus dereferência anotado retoricamente. Anais do V Encontro de Corpora, 2005.

P. Rashmi, D. Nikhil, L. Alan, M. Eleni, R. Livio et al., The Penn Discourse Treebank 2.0, Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC'2008), pp.2961-2968, 2008.

Q. ;. Wusong and . ??????????-?-?-?-?-??-?-?-?, Jiyu xiucijiegoulilun de hanyuxinwenpinglun yupianjiegou yanjiu, 2010.

D. Rafalovitch-alexandre and . Robert, United Nations general assembly resolutions: A six-languages parallel corpus, Proceedings of Machine Translation Summit XII, pp.292-299, 2009.

R. Philip, D. Olsen-mari-broman, and . Mona, The Bible as a Parallel Corpus: Annotating the 'Book of, Computers and the Humanities, vol.33, issue.1-2, pp.129-153, 1999.

S. Manfred and N. Arne, Potsdam Commentary Corpus 2.0: Annotation for Discourse Research, Proceedings of the International Conference on Language Resources and Evaluation (LREC'2014), pp.925-929, 2014.

T. Maite and R. , Discourse Relations Reference Corpus, 2008.

T. Svetlana, P. Dina, A. Margarita, K. Maria, N. Alexander et al., Rhetorical relation markers in Russian RST Treebank, Proceedings of 6th Workshop Recent Advances in RST and Related Formalisms, pp.29-33, 2017.

A. Van-dijk-teun, MACROSTRUCTURES: AnInterdisciplinary Study of Global Structures in Discourse, Interaction, and Cognition, 1980.

W. Ling, G. Xiang, D. Chris, A. Black, and T. Isabel, Mircoblogs as Parallel Corpora, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL' 2013), pp.176-186, 2013.

W. Shangyi, On Application of computer-based corpora in translation, Proceedings of 2nd International Conference on Computer, Electrical, and Systems Sciences, and Engineering (CESSE' 2014), pp.173-178, 2014.

Y. Ming, Discursive Usage of Six Chinese Punctuation Marks, Proceedings of the COLING/ACL 2006 Student Research Workshop, pp.43-48, 2006.

L. Zhou-lanjun, W. Binyang, W. Zhongyu, and . Kam-fai, The CUHK Discourse Treebank for Chinese: Annotating Explicit Discourse Connectives for the Chinese Treebank, Proceedings of the International Conference on Language Resources and Evaluation (LREC'2014), pp.942-949, 2014.

Z. Amir, rstWeb-A Browser-based Annotation Interface for Rhetorical Structure Theory and Discourse Relations, Proceedings of NAACL-HLT 2016 System Demonstrations, pp.1-5, 2016.

A. References, M. Bies, K. Ferguson, R. Katz, and . Macintyre, Bracketing guidelines for Treebank II style, 1995.

A. Bies, J. Mott, C. Warner, and S. Kulick, English Web Treebank. LDC2012T13, Linguistic Data Consortium, 2012.

D. Jinho, M. Choi, and . Palmer, Robust constituent-to-dependency conversion for English, Proceedings of the 9th International Workshop on Treebanks and Linguistic Theories (TLT 2010), pp.55-66, 2010.

M. De-marneffe and C. D. Manning, Stanford typed dependencies manual, 2013.

S. Dipper, M. Götze, and S. Skopeteas, Information structure in cross-linguistic corpora: Annotation guidelines for phonology, morphology, syntax, semantics, and information structure, Interdisciplinary Studies on Information Structure, p.7, 2007.

R. Garside and N. Smith, A hybrid grammatical tagger: CLAWS4, Corpus Annotation: Linguistic Information from Computer Text Corpora, pp.102-121, 1997.

G. Leech, T. Mcenery, and M. Weisser, SPAAC speech-act annotation scheme, 2003.

C. William, S. A. Mann, and . Thompson, Rhetorical Structure Theory: Toward a functional theory of text organization, Text, vol.8, issue.3, pp.243-281, 1988.

C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard et al., The Stanford CoreNLP natural language processing toolkit, Proceedings of ACL 2014: System Demonstrations, pp.55-60, 2014.

J. Nivre, Z. Agi´cagi´c, L. Ahrenberg, M. J. Aranzabe, M. Asahara et al., Normunds Gr¯ uz¯ ?tis, Linh Hà M?, Dag Haug, Barbora Hladká, Petter Hohle, Radu Ion, Elena Irimia

L. Burnard, Reference guide for the British National Corpus, 2007.

P. Cook, A. Fazly, and S. Stevenson, The VNC-Tokens dataset, Proceedings of the LREC workshop towards a shared task for Multiword Expressions, pp.19-22, 2008.

R. Ehren, Literal or idiomatic? Identifying the reading of single occurrences of German multiword expressions using word embeddings, Proceedings of the Student Research Workshop at the 15th Conference of the European Chapter of the Association for Computational Linguistics, pp.103-112, 2017.

A. Fazly, P. Cook, and S. Stevenson, Unsupervised type and token identification of idiomatic expressions, Computational Linguistics, vol.35, issue.1, pp.61-103, 2009.

A. Ferraresi, E. Zanchetta, M. Baroni, and S. Bernardini, Introducing and evaluating ukWaC, a very large web-derived corpus of English, Proceedings of LREC, pp.47-54, 2008.

W. Gharbieh, C. Virendra, P. Bhavsar, and . Cook, A word embedding approach to identifying verbnoun idiomatic combinations, Proceedings of the 12th Workshop on Multiword Expressions, pp.112-118, 2016.

P. Isabelle, C. Cherry, and G. Foster, A challenge set approach to evaluating machine translation, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.2476-2486, 2017.

T. Ioannis-korkontzelos, F. M. Zesch, C. Zanzotto, and . Biemann, SemEval-2013 task 5: Evaluating phrasal semantics, Proceedings of SemEval, pp.39-47, 2013.

C. Liu and R. Hwa, Phrasal substitution of idiomatic expressions, Proceedings of NAACL, pp.363-373, 2016.

J. Pennington, R. Socher, and C. D. Manning, GloVe: Global vectors for word representation, Proceedings of EMNLP 2014, pp.1532-1543, 2014.

A. Ivan, T. Sag, F. Baldwin, A. Bond, D. Copestake et al., Multiword expressions: A pain in the neck for NLP, Proceedings of CICLING, pp.1-15, 2002.

G. D. Salton, R. J. Ross, and J. D. Kelleher, An empirical study of the impact of idioms on phrase based statistical machine translation of English to Brazilian-Portuguese, Proceedings of the 3rd Workshop on Hybrid Approaches to Translation (HyTra), pp.36-41, 2014.

G. D. Salton, R. J. Ross, and J. D. Kelleher, Evaluation of a substitution method for idiom transformation in statistical machine translation, Proceedings of the 10th Workshop on Multiword Expressions, pp.38-42, 2014.

C. Sporleder and L. Li, Unsupervised recognition of literal and non-literal use of idiomatic expressions, Proceedings of EACL, pp.754-762, 2009.

C. Sporleder, L. Li, P. J. Gorinski, and X. Koch, Idioms in context: The IDIX corpus, Proceedings of LREC, pp.639-646, 2010.

L. Williams, C. Bannister, M. Arribas-ayllon, A. Preece, and I. Spasi´cspasi´c, The role of idioms in sentiment analysis, Expert Systems with Applications, vol.42, issue.21, pp.7375-7385, 2015.

T. References, C. Baldwin, T. Bannard, D. Tanaka, and . Widdows, An empirical model of multiword expression decomposability, Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, pp.89-96, 2003.

T. Baldwin and S. Kim, Multiword expressions. Handbook of natural language processing, vol.2, pp.267-292, 2010.

C. J. Bannard, Acquiring phrasal lexicons from corpora, 2006.

T. Brants, C. Ashok, P. Popat, . Xu, J. Franz et al., Large language models in machine translation, Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2007.

M. Carpuat and M. Diab, Task-based evaluation of multiword expressions: a pilot study in statistical machine translation, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp.242-245, 2010.

Y. Chen, M. Zhou, and S. Wang, Reranking answers for definitional qa using language modeling, Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pp.1081-1088, 2006.
DOI : 10.3115/1220175.1220311

URL : http://dl.acm.org/ft_gateway.cfm?id=1220311&type=pdf

K. Cho, B. Van-merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares et al., Learning phrase representations using rnn encoder-decoder for statistical machine translation, 2014.
DOI : 10.3115/v1/d14-1179

URL : https://hal.archives-ouvertes.fr/hal-01433235

M. Collins, B. Roark, and M. Saraclar, Discriminative syntactic language modeling for speech recognition, Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp.507-514, 2005.
DOI : 10.3115/1219840.1219903

URL : http://dl.acm.org/ft_gateway.cfm?id=1219903&type=pdf

L. , S. Pinheiro, and M. Dras, Stock market prediction with deep learning: A character-based neural language model for event-based trading, Proceedings of the Australasian Language Technology Association Workshop, pp.6-15, 2017.

W. Gharbieh, V. Bhavsar, and P. Cook, Deep learning models for multiword expression identification, Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017), pp.54-64, 2017.
DOI : 10.18653/v1/s17-1006

URL : https://doi.org/10.18653/v1/s17-1006

S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural computation, vol.9, issue.8, pp.1735-1780, 1997.

G. Katz and E. Giesbrecht, Automatic identification of non-compositional multi-word expressions using latent semantic analysis, Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties, pp.12-19, 2006.
DOI : 10.3115/1613692.1613696

URL : http://dl.acm.org/ft_gateway.cfm?id=1613696&type=pdf

N. Su, T. Kim, and . Baldwin, How to pick out token instances of english verb-particle constructions, Language Resources and Evaluation, vol.44, issue.1-2, pp.97-113, 2010.

I. Korkontzelos and S. Manandhar, Can recognising multiword expressions improve shallow parsing?, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp.636-644, 2010.

W. Ling, C. Dyer, A. W. Black, I. Trancoso, R. Fermandez et al., Finding function in form: Compositional character models for open vocabulary word representation, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp.1520-1530, 2015.
DOI : 10.18653/v1/d15-1176

URL : https://doi.org/10.18653/v1/d15-1176

T. Mikolov, K. Chen, G. Corrado, and J. Dean, Efficient estimation of word representations in vector space, Proceedings of Workshop at the International Conference on Learning Representations, 2013.

T. Mikolov, I. Sutskever, A. Deoras, H. Le, and S. Kombrink, Subword language modeling with neural networks, 2012.

J. Mitchell and M. Lapata, Composition in distributional models of semantics, Cognitive science, vol.34, issue.8, pp.1388-1429, 2010.
DOI : 10.1111/j.1551-6709.2010.01106.x

URL : https://www.era.lib.ed.ac.uk/bitstream/1842/4927/1/Mitchell2011.pdf

F. Peng, D. Schuurmans, S. Wang, and V. Keselj, Language independent authorship attribution using character level language models, Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics, vol.1, pp.267-274, 2003.

S. Reddy, D. Mccarthy, and S. Manandhar, An empirical study on compositionality in compound nouns, Proceedings of 5th International Joint Conference on Natural Language Processing, pp.210-218, 2011.

B. Salehi, P. Cook, and T. Baldwin, Using distributional similarity of multi-way translations to predict multiword expression compositionality, Proceedings of the 14th Conference of the EACL (EACL, pp.472-481, 2014.

B. Salehi, P. Cook, and T. Baldwin, A word embedding approach to predicting the compositionality of multiword expressions, Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Denver, pp.977-983, 2015.

G. Salton, R. Ross, and J. Kelleher, Idiom token classification using sentential distributed semantics, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol.1, pp.194-204, 2016.

D. Cicero, B. Santos, and . Zadrozny, Learning character-level representations for part-of-speech tagging, Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp.1818-1826, 2014.

N. Schneider, E. Danchik, C. Dyer, and N. A. Smith, Discriminative lexical semantic segmentation with gaps: Running the mwe gamut, Transactions of the Association of Computational Linguistics, vol.2, pp.193-206, 2014.

P. Schone and D. Jurafsky, Is knowledge-free induction of multiword unit dictionary headwords a solved problem, Proceedings of the 6th Conference on Empirical Methods in Natural Language Processing, pp.100-108, 2001.

S. Schulte-im-walde, S. Müller, and S. Roller, Exploring vector space models to predict the compositionality of German noun-noun compounds, Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, vol.1, pp.255-265, 2013.

R. Susanto, H. L. Chieu, and W. Lu, Learning to capitalize with characterlevel recurrent neural networks: An empirical study, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp.2090-2095, 2016.

S. Claudia-von-der-heide and . Borgwaldt, Assoziationen zu unter-, basis-und oberbegriffen. eine explorative studie, Proceedings of the 9th Norddeutsches Linguistisches Kolloquium, pp.51-74, 2009.

L. Ahrenberg, LinES: An English-Swedish parallel treebank, Proc. of NODALIDA, pp.270-273, 2007.

T. Baldwin and S. Kim, Multiword expressions, Handbook of Natural Language Processing, pp.267-292, 2010.

A. Bies, J. Mott, C. Warner, and S. Kulick, English Web Treebank, 2012.

G. Mathieu-constant, J. Eryi?-git, L. Monti, C. Van-der-plas, M. Ramisch et al., Multiword expression processing: a survey, Computational Linguistics, vol.43, issue.4, pp.837-892, 2017.

M. Farahmand and J. Nivre, Modeling the statistical idiosyncrasy of multiword expressions, Proc. of the 11th Workshop on Multiword Expressions, pp.34-38, 2015.

C. Ramisch, S. R. Cordeiro, A. Savary, V. Vincze, A. Verginica-barbu-mititelu et al., Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions, Proc. of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions, 2018.
URL : https://hal.archives-ouvertes.fr/hal-02152557

C. Ramisch, Multiword Expressions Acquisition: A Generic and Open Framework. Theory and Applications of Natural Language Processing, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01199863

I. Sag, T. Baldwin, F. Bond, A. Copestake, and D. Flickinger, Multiword expressions: a pain in the neck for NLP, Computational Linguistics and Intelligent Text Processing, vol.2276, pp.189-206, 2002.

A. Savary, C. Ramisch, S. R. Cordeiro, F. Sangati, V. Vincze et al., The PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions, Proc. of the 13th Workshop on Multiword Expressions (MWE 2017), pp.31-47, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01865575

N. Schneider and N. A. Smith, A corpus and model integrating multiword expressions and supersenses, Proc. of NAACL-HLT, pp.1537-1547, 2015.

N. Schneider, S. Onuffer, N. Kazour, E. Danchik, M. T. Mordowanec et al., Comprehensive annotation of multiword expressions in a social web corpus, Proc. of LREC, pp.455-461, 2014.

N. Schneider, D. Hovy, A. Johannsen, and M. Carpuat, SemEval-2016 Task 10: Detecting Minimal Semantic Units and their Meanings (DiMSUM), Proc. of SemEval, pp.546-559, 2016.

N. Silveira, T. Dozat, M. Marneffe, S. R. Bowman, M. Connor et al., A gold standard dependency corpus for English, Proc. of LREC, pp.2897-2904, 2014.

D. Vrande?i´vrande?i´c and M. Krötzsch, Wikidata: A free collaborative knowledgebase, Communications of the ACM, vol.57, issue.10, pp.78-85, 2014.

D. Zeman, M. Popel, M. Straka, J. Haji?, J. Nivre et al., Ça? gr? Çöltekin, 2008.

M. De-marneffe, M. Dozat, T. Silveira, N. Haverinen, K. Ginter et al., Universal Stanford dependencies: A cross-linguistic typology, Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pp.4585-4592, 2014.

S. Kahane, M. Courtin, and K. Gerdes, Multi-word annotation in Syntactic treebanks: Proposition for Universal Dependency, Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories, pp.181-189, 2017.

M. Marcus, Building a Large Annotated Corpus of English: The Penn Treebank, Computational Linguistics, vol.19, issue.2, pp.313-330, 1993.

J. Nivre, M. De-marneffe, F. Ginter, Y. Goldberg, J. Hajic et al., Universal Dependencies v1: A Multilingual Treebank Collection, Proceedings of the Tenth International Conference on Language Resources and Evaluation, 2016.

I. Sag, T. Baldwin, F. Bond, A. Copestake, and D. Flickinger, Multiword expressions: A pain in the neck for NLP, Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing, CICLing, pp.1-15, 2002.

N. Schneider, S. Onuffer, N. Kazour, E. Danchik, M. Mordowanec et al., Comprehensive annotation of multiword expressions in a social web corpus, Proceedings of the Ninth International Conference on Language Resources and Evaluation, pp.455-461, 2014.

M. Candito, M. Constant, C. Ramisch, A. Savary, Y. Parmentier et al., Annotation d'expressions polylexicales verbales en français, Proceedings of Traitement Automatique des Langues Naturelles (TALN), pp.1-9, 2017.

G. Mathieu-constant, J. Eryi?-git, L. Monti, C. Van-der-plas, M. Ramisch et al., Multiword expression processing: A survey, Computational Linguistics, vol.43, issue.4, pp.837-892, 2017.

R. W. Gibbs, J. M. Bogdanovich, J. R. Sykes, and D. J. Barr, Metaphor in idiom comprehension, Journal of Memory and Language, vol.37, pp.141-154, 1997.

. Raymond-w-gibbs, What do idioms really mean?, Journal of Memory and Language, vol.31, issue.4, pp.485-506, 1992.

B. Guillaume, K. Fort, and N. Lefebvre, Crowdsourcing complex language resources: Playing to annotate dependency syntax, Proceedings of the 26th International Conference on Computational Linguistics (COLING): Technical Papers, pp.3041-3052, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01378980

C. Krstev and A. Savary, Games on multiword expressions for community building, INFOtheca: Journal of Information and Library Science, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01635502

. Mathieu-lafourcade, Making people play for lexical acquisition, Proceedings of the 7th Symposium on Natural Language Processing, 2007.

C. Madge, J. Chamberlain, U. Kruschwitz, and M. Poesio, Experiment-driven development of a gwap for marking segments in text, Extended Abstracts Publication of the Annual Symposium on ComputerHuman Interaction in Play, pp.397-404, 2017.

J. Nivre, M. De-marneffe, F. Ginter, Y. Goldberg, J. Hajic et al., Universal dependencies v1: A multilingual treebank collection, Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), 2016.

M. Poesio, J. Chamberlain, U. Kruschwitz, L. Robaldo, and L. Ducceschi, Phrase detectives: Utilizing collective intelligence for internet-scale language resource creation, ACM Trans. Interact. Intell. Syst, vol.3, issue.1, p.44, 2013.

C. Ramisch, S. Cordeiro, L. Zilio, M. Idiart, A. Villavicencio et al., How naked is the naked truth? a multilingual lexicon of nominal compound compositionality, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol.2, pp.156-161, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01459911

H. Ali, Y. Chali, . Sadid, and . Hasan, Automation of question generation from sentences, Proceedings of QG2010: The Third Workshop on Question Generation, pp.58-67, 2010.

S. Alkuhlani and N. Habash, A corpus for modeling morpho-syntactic agreement in Arabic: gender, number and rationality, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers, vol.2, pp.357-362, 2011.

E. Charniak, Statistical parsing with a context-free grammar and word statistics, p.18, 1997.

M. Diab, N. Habash, O. Rambow, and R. Roth, LDC Arabic treebanks and associated corpora: Data divisions manual, 2013.

I. Gayo, Question parsing for QA in Spanish, Proceedings of the Second Student Research Workshop associated with RANLP 2011, pp.73-78, 2011.

S. Green, . Christopher, and . Manning, Better Arabic parsing: Baselines, evaluations, and analysis, Proceedings of the 23rd International Conference on Computational Linguistics, pp.394-402, 2010.

N. Habash and . Ryan-m-roth, CATiB: The Columbia Arabic treebank, Proceedings of the ACLIJCNLP 2009 conference short papers, pp.221-224, 2009.

N. Habash, A. Soudi, and T. Buckwalter, On Arabic Transliteration, Arabic Computational Morphology: Knowledge-based and Empirical Methods, 2007.

Y. Nizar and . Habash, Introduction to Arabic natural language processing, Synthesis Lectures on Human Language Technologies, vol.3, issue.1, pp.1-187, 2010.

B. Haddow and P. Koehn, Analysing the effect of out-of-domain data on SMT systems, Proceedings of the Seventh Workshop on Statistical Machine Translation, pp.422-432, 2012.

T. Hara, T. Matsuzaki, Y. Miyao, and J. Tsujii, Exploring difficulties in parsing imperatives and questions, Proceedings of 5th International Joint Conference on Natural Language Processing, pp.749-757, 2011.

M. Heilman, . Noah, and . Smith, Good question! statistical ranking for question generation, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp.609-617, 2010.

U. Hermjakob, Parsing and question classification for question answering, Proceedings of the workshop on Open-domain question answering, vol.12, pp.1-6, 2001.

J. Judge, A. Cahill, and J. Van-genabith, Questionbank: Creating a corpus of parse-annotated questions, Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pp.497-504, 2006.

D. Klein, . Christopher, and . Manning, Fast exact inference with a factored model for natural language parsing, Advances in neural information processing systems, pp.3-10, 2003.

S. Kübler, R. Mcdonald, and J. Nivre, Dependency parsing, Synthesis Lectures on Human Language Technologies, vol.1, issue.1, pp.1-127, 2009.

M. Maamouri, A. Bies, T. Buckwalter, and W. Mekki, The Penn Arabic Treebank: Building a large-scale annotated Arabic corpus, NEMLAR conference on Arabic language resources and tools, vol.27, pp.466-467, 2004.

M. Makatchev, I. Fanaswala, A. Abdulsalam, B. Browning, W. Ghazzawi et al., Dialogue patterns of an Arabic robot receptionist, Human-Robot Interaction (HRI), 2010 5th ACM/IEEE International Conference on, pp.167-168, 2010.

Y. Marton, N. Habash, and O. Rambow, Dependency parsing of modern standard Arabic with lexical and inflectional features, Computational Linguistics, vol.39, issue.1, pp.161-194, 2013.
DOI : 10.1162/coli_a_00138

S. Petrov, P. Chang, M. Ringgaard, and H. Alshawi, Uptraining for accurate deterministic question parsing, Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP '10, pp.705-713, 2010.

D. Seddah and M. Candito, Hard time parsing questions: Building a questionbank for French, Tenth International Conference on Language Resources and Evaluation (LREC, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01457184

S. Sekine, The domain dependence of parsing. ANLC '97, pp.96-102, 1997.

A. Iulian-vlad-serban, C. García-durán, S. Gulcehre, S. Ahn, A. Chandar et al., Generating factoid questions with recurrent neural networks: The 30m factoid questionanswer corpus, 2016.

O. Smr?, The other Arabic treebank: Prague dependencies and functions. Arabic computational linguistics: Current implementations, p.104, 2006.

M. Steedman, M. Osborne, A. Sarkar, S. Clark, R. Hwa et al., Bootstrapping statistical parsers from small datasets, Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics, vol.1, pp.331-338, 2003.
DOI : 10.3115/1067807.1067851

URL : http://dl.acm.org/ft_gateway.cfm?id=1067851&type=pdf

Y. Su and X. Yan, Cross-domain semantic parsing via paraphrasing, 2017.
DOI : 10.18653/v1/d17-1127

URL : https://doi.org/10.18653/v1/d17-1127

D. Taji, N. Habash, and D. Zeman, Universal dependencies for Arabic, Proceedings of the Third Arabic Natural Language Processing Workshop, pp.166-176, 2017.
DOI : 10.18653/v1/w17-1320

URL : https://doi.org/10.18653/v1/w17-1320

D. Taji, J. E. Gizuli, and N. Habash, An Arabic dependency treebank in the travel domain, Proceedings of the 3rd Workshop on Open-Source Arabic Corpora and Processing Tools, 2018.

R. Tsarfaty, J. Nivre, and E. Andersson, Joint evaluation of morphological segmentation and syntactic parsing, Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, vol.2, pp.6-10, 2012.

M. Van-der-wees, A. Bisazza, W. Weerkamp, and C. Monz, What's in a domain? analyzing genre and topic differences in statistical machine translation, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol.2, pp.560-566, 2015.

Y. Zaki, H. Hajjar, M. Hajjar, and G. Bernard, ACM. 1. Two universal categories, that is, valid for all languages participating in the task: (a) LIGHT VERB CONSTRUCTIONS (LVC), divided into two subcategories: i. LVCs in which the verb is semantically totally bleached (LVC.full), DE eine Rede halten 'hold a speech'?'give a speech', ii. LVCs in which the verb adds a causative meaning to the noun (LVC.cause), 3 e.g. PL narazi´cnarazi´c na straty 'expose to losses' (b) VERBAL IDIOMS (VID), 4 grouping all VMWEs not belonging to other categories, and most often having a relatively high degree of semantic non-compositionality, Proceedings of the International Conference on Big Data and Advanced Wireless Technologies, p.31, 2016.

, REFL) either always cooccurs with a given verb, or markedly changes its meaning or subcategorisation frame, e.g. PT se formar 'REFL form'?'graduate' (b) VERB-PARTICLE CONSTRUCTIONS (VPC)-pervasive in Germanic languages and Hungarian, rare in Romance and absent in Slavic languages-with two subcategories: i. fully non-compositional VPCs (VPC.full), 6 in which the particle totally changes the meaning of the verb, e.g. HU berúg 'in-kick'?'get drunk' ii. semi non-compositional VPCs (VPC.semi), 7 in which the particle adds a partly predictable but non-spatial meaning to the verb, e.g. EN wake up (c) MULTI-VERB CONSTRUCTIONS (MVC) 8-close to semantically non-compositional serial verbs in Asian languages like Chinese, Hindi, Indonesian and Japanese, Three quasi-universal categories, valid for some language groups or languages, but not all: (a) INHERENTLY REFLEXIVE VERBS (IRV) 5-pervasive in Romance and Slavic languages, and present in Hungarian and German-in which the reflexive clitic

, One language-specific category, introduced for Italian: References Hazem Al Saied, Matthieu Constant, and Marie Candito. 2017. The ATILF-LLF system for Parseme shared task: a transition-based verbal multiword expression tagger, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.127-132

M. Jesús-aranzabe, A. Atutxa, K. Bengoetxea, A. Diaz-de-ilarraza, and I. Goenaga, Automatic conversion of the Basque dependency treebank to Universal Dependencies, Proceedings of the Fourteenth International Workshop on Treebanks an Linguistic Theories (TLT14), pp.233-241, 2015.

V. ?pela-arhar-holdt, S. Gorjanc, and . Krek, FidaPLUS corpus of Slovenian: the new generation of the Slovenian reference corpus: its design and tools, Proceedings of the Corpus Linguistics Conference, CL2007, 2007.

T. Baldwin and S. Kim, Multiword expressions, Handbook of Natural Language Processing, pp.978-1420085921, 2010.

A. Riyaz, R. Bhat, A. Bhatt, P. Farudi, B. Klassen et al., Sri Ramagurumurthy Vishnu, and Fei Xia, 2015.

O. Bojar, R. Chatterjee, C. Federmann, Y. Graham, B. Haddow et al., Findings of the 2016 Conference on Machine Translation (WMT16). In Proceedings of the First Conference on Machine Translation (WMT16), vol.2, pp.131-198, 2016.

T. Boro¸sboro¸s, S. Pipa, D. Verginica-barbu-mititelu, and . Tufi¸stufi¸s, A data-driven approach to verbal multiword expression detection. PARSEME Shared Task system description paper, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.121-126, 2017.

M. Candito and M. Constant, Strategies for contiguous multiword expression analysis and dependency parsing, Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, vol.1, pp.743-753, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01022415

M. Candito and D. Seddah, Le corpus sequoia : annotation syntaxique et exploitation pour l'adaptation d'analyseur par pont lexical, Proceedings of TALN 2012, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00698938

M. Constant and J. Nivre, A transition-based system for joint lexical and syntactic analysis, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol.1, pp.161-171, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01808689

G. Mathieu-constant, J. Eryi?-git, L. Monti, C. Van-der-plas, M. Ramisch et al., Multiword expression processing: A survey, Computational Linguistics, vol.43, issue.4, pp.837-892, 2017.

D. Csendes, J. Csirik, T. Gyimóthy, and A. Kocsor, The Szeged TreeBank, Proceedings of the 8th International Conference on Text, Speech and Dialogue, TSD 2005, pp.123-132, 2005.

J. R. Finkel and C. D. Manning, Joint parsing and named entity recognition, HLTNAACL, pp.326-334, 2009.

S. Green, M. De-marneffe, J. Bauer, and C. D. Manning, Multiword expression identification with tree substitution grammars: A parsing tour de force with French, Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp.725-735, 2011.
URL : https://hal.archives-ouvertes.fr/hal-01111383

S. Green, M. De-marneffe, and C. D. Manning, Parsing models for identifying multiword expressions, Computational Linguistics, vol.39, issue.1, pp.195-227, 2013.

N. Klyueva, A. Doucet, and M. Straka, Neural networks for multi-word expression detection, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.60-65, 2017.

S. Koeva, I. Stoyanova, and S. Leseva, Rositsa Dekova, Tsvetana Dimitrova, and Ekaterina Tarpomanova. 2012. The Bulgarian National Corpus: Theory and practice in corpus design, Journal of Language Modelling, vol.0, issue.1, pp.65-110

S. Krek, K. Dobrovoljc, T. Erjavec, S. Mo?e, N. Ledinek et al., Training corpus ssj500k 2.0. Slovenian language resource repository CLARIN, 2017.

J. Le-roux, A. Rozenknop, and M. Constant, Syntactic parsing and compound recognition via dual decomposition: Application to French, Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp.1875-1885, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01074298

V. Lyding, E. Stemle, C. Borghetti, M. Brunello, S. Castagnoli et al., The PAISÀ Corpus of Italian Web Texts, Proceedings of the 9th Web as Corpus Workshop (WaC-9, pp.36-43, 2014.

A. Maldonado, L. Han, E. Moreau, A. Alsulaimani, K. D. Chowdhury et al., Detection of verbal multi-word expressions via conditional random fields with syntactic dependency features and semantic re-ranking, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.114-120, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01520762

R. Mcdonald, J. Nivre, Y. Quirmbach-brundage, Y. Goldberg, D. Das et al., Universal dependency annotation for multilingual parsing, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, vol.2, pp.92-97, 2013.

A. Nasr, C. Ramisch, J. Deulofeu, and A. Valli, Joint dependency parsing and multiword expression tokenization, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol.1, pp.1116-1126, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01199865

L. Nerima, V. Foufi, and E. Wehrli, Parsing and MWE detection: Fips at the PARSEME shared task, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.54-59, 2017.

J. Nivre, M. De-marneffe, F. Ginter, Y. Goldberg, J. Haji? et al., Universal Dependencies v1: a multilingual treebank collection, Proceedings of the Tenth International Conference on Language Resources and Evaluation, pp.1659-1666, 2016.

M. Ogrodniczuk, K. G?owi´nskag?owi´nska, M. Kope´ckope´c, A. Savary, and M. Zawis?awska, Coreference in Polish: Annotation, Resolution and Evaluation, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01174653

A. Przepiórkowski, M. Ba´nkoba´nko, R. L. Górski, B. Lewandowska-tomaszczyk, M. ?azi´nski?azi´nski et al., National Corpus of Polish, Proceedings of the 5th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, pp.259-263, 2011.

B. Qasemizadeh and S. Rahimi, Persian in MULTEXT-East framework, Advances in Natural Language Processing, pp.541-551, 2006.

G. Victoria-rosén, K. D. Smørdal-losnegaard, E. Smedt, A. Bej?ek, A. Savary et al., A survey of multiword expressions in treebanks, Proceedings of the 14th International Workshop on Treebanks & Linguistic Theories conference, 0233.

A. Ivan, T. Sag, F. Baldwin, A. Bond, D. Copestake et al., Multiword Expressions: A Pain in the Neck for NLP, Proceedings of the 3rd International Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2002), pp.1-15, 2002.

A. Savary, M. Sailer, Y. Parmentier, M. Rosner, V. Rosén et al., PARSEME-PARSing and Multiword Expressions within a European multilingual network, 7th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01223349

A. Savary, C. Ramisch, S. Cordeiro, F. Sangati, V. Vincze et al., The PARSEME shared task on automatic identification of verbal multiword expressions, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.31-47, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01865575

A. Savary, M. Candito, . Verginica-barbu, E. Mititelu, F. Bej?ek et al., Ivelina Stoyanova, and Veronika Vincze. forthcoming. PARSEME multilingual corpus of verbal multiword expressions, Multiword expressions at length and in depth, 2017.

N. Schneider, D. Hovy, A. Johannsen, and M. Carpuat, SemEval-2016 Task 10: Detecting Minimal Semantic Units and their Meanings (DiMSUM), Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pp.546-559, 2016.

K. I. Simkó, V. Kovács, and V. Vincze, USzeged: Identifying verbal multiword expressions with POS tagging and parsing techniques, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.48-53, 2017.

M. Taulé, A. Peris, and H. Rodríguez, Iarg-AnCora: Spanish corpus annotated with implicit arguments, Language Resources and Evaluation, vol.50, issue.3, pp.549-584, 2016.

V. Vincze, J. Zsibrita, I. Nagy, and T. , Dependency parsing for identifying Hungarian light verb constructions, Proceedings of the Sixth International Joint Conference on Natural Language Processing, pp.207-215, 2013.

J. Waszczuk, A. Savary, and Y. Parmentier, Promoting multiword expressions in A* TAG parsing, COLING 2016, 26th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, pp.429-439, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01378903

E. Wehrli, V. Seretan, and L. Nerima, Sentence analysis and collocation identification, Proceedings of the Workshop on Multiword Expressions: from Theory to Applications (MWE 2010), pp.27-35, 2010.

E. Wehrli, The relevance of collocations for parsing, Proceedings of the 10th Workshop on Multiword Expressions (MWE), pp.26-32, 2014.

, Appendix A: Composition of the corpus anotation teams

T. Dimitrova, S. Leseva, V. Stefanova, M. Todorova, ;. Hr)-maja-buljan et al., Polona Gantar (LL), Simon Krek (LL), ?pela Arhar Holdt, Jaka?ibejJaka?Jaka?ibej, Teja Kav?i?, Taja Kuzman. Germanic languages

M. Candito, ;. Ll), M. Constant, C. Ramisch, C. Pasquer et al., Other languages: (AR) Abdelati Hawwari (LL)

L. , I. Aduriz, A. Estarrona, I. Gonzalez, A. Gurrutxaga et al., Stella Papadelli; (FA) Behrang QasemiZadeh (LL), Proceedings of the ECML Workshop on Mining and Learning in Graphs, 2006.

F. Jousse, XML Tree Transformations with Probabilistic Models. Theses, 2007.
URL : https://hal.archives-ouvertes.fr/tel-00342649

J. D. Lafferty, A. Mccallum, and F. C. Pereira, Conditional random fields: Probabilistic models for segmenting and labeling sequence data, Proceedings of the Eighteenth International Conference on Machine Learning, ICML '01, pp.282-289, 2001.

T. Lavergne, O. Cappé, and F. Yvon, Practical very large scale CRFs, Proceedings the 48th Annual Meeting of the Association for Computational Linguistics (ACL), pp.504-513, 2010.

A. Maldonado, L. Han, E. Moreau, A. Alsulaimani, K. D. Chowdhury et al., Detection of Verbal Multi-Word Expressions via Conditional Random Fields with Syntactic Dependency Features and Semantic Re-Ranking, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.114-120, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01520762

E. Moreau, C. Vogel, ;. , K. Choukri, C. Cieri et al., Multilingual Word Segmentation: Training Many Language-Specific Tokenizers Smoothly Thanks to the Universal Dependencies Corpus, Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 2018.
URL : https://hal.archives-ouvertes.fr/hal-01822151

E. Moreau, A. Alsulaimani, A. Maldonado, L. Han, C. Vogel et al., Semantic Re-Ranking of CRF Label Sequences for Verbal Multiword Expression Identification, 2018.

C. Ramisch, S. R. Cordeiro, A. Savary, V. Vincze, A. Verginica-barbu-mititelu et al., Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions, Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01865575

A. Savary, C. Ramisch, S. R. Cordeiro, F. Sangati, and V. Vincze, Behrang QasemiZadeh, Marie Candito, Fabienne Cap, Voula Giouli, Ivelina Stoyanova, and Antoine Doucet. 2017. The PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions, Proceedings of The 13th Workshop on Multiword Expressions, pp.31-47

C. Sutton and A. Mccallum, An Introduction to Conditional Random Fields, Found. Trends Mach. Learn, vol.4, issue.4, pp.267-373, 2012.

M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen et al., Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Software available from tensorflow.org

T. Baldwin and S. Kim, Multiword expressions. Handbook of natural language processing, vol.2, pp.267-292, 2010.

F. Chollet, , 2015.

G. Mathieu-constant, J. Eryi?-git, and . Monti, Multiword expression processing: a survey, Computational Linguistics, vol.43, issue.4, pp.837-892, 2017.

E. Grave, P. Bojanowski, P. Gupta, A. Joulin, and T. Mikolov, Learning Word Vectors for 157 Languages, Proceedings of the International Conference on Language Resources and Evaluation (LREC, 2018.

A. Graves, M. Abdel-rahman, and G. Hinton, Speech recognition with deep recurrent neural networks, Acoustics, speech and signal processing (icassp), 2013 ieee international conference on, pp.6645-6649, 2013.

Z. Huang, W. Xu, and K. Yu, Bidirectional LSTM-CRF models for sequence tagging, 2015.

N. Klyueva, A. Doucet, and M. Straka, Neural Networks for Multi-Word Expression Detection, p.60, 2017.

J. Lafferty, A. Mccallum, and F. Pereira, Conditional random fields: Probabilistic models for segmenting and labeling sequence data, 2001.

J. Legrand and R. Collobert, Phrase representations for multiword expressions, Proceedings of the 12th Workshop on Multiword Expressions, 2016.
DOI : 10.18653/v1/w16-1810

URL : https://doi.org/10.18653/v1/w16-1810

C. Ramisch, S. R. Cordeiro, A. Savary, V. Vincze, A. Verginica-barbu-mititelu et al., Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions, Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01865575

N. Reimers and I. Gurevych, Reporting score distributions makes a difference: Performance study of lstm-networks for sequence tagging, 2017.

N. Schneider, E. Danchik, C. Dyer, and N. Smith, Discriminative lexical semantic segmentation with gaps: running the MWE gamut, Transactions of the Association for Computational Linguistics, vol.2, pp.193-206, 2014.

, How externally computed word embeddings influence the performance of this methodology on MWE detection

, Will this graph-based decoding strategy have a positive impact on standard or domain-specific NER

, What is the source for lower f-scores on languages such as Hungarian, Deutsch and Hindi. that, at first glance, have enough training data to support our approach

, How will this method work for other NLP tasks which involve sparse and long-range dependencies between words, one good example being co-reference resolution

, Whether or not parsing accuracy is sufficient enough to support MWE identification is a different question. Also, given that our system is inspired from parsing, our intuition is that parsing will not enhance results. On a related note, NLP-Cube has end-2-end raw text processing to UD format processing capabilities. This means that it can be used for MWE detection without requiring external CUPT files. Anyone interested can check the end-2-end raw text processing capacity of NLP-Cube on this year's shared task on universal dependencies parsing

A. Hazem, M. Saied, M. Constant, and . Candito, The atilf-llf system for parseme shared task: a transition-based verbal multiword expression tagger, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.127-132, 2017.

P. C. Jason, E. Chiu, and . Nichols, Named entity recognition with bidirectional lstm-cnns, 2015.

A. Graves, Supervised sequence labelling with recurrent neural networks, vol.385, 2012.

A. Graves and J. Schmidhuber, Offline handwriting recognition with multidimensional recurrent neural networks, Advances in neural information processing systems, pp.545-552, 2009.

E. Kiperwasser and Y. Goldberg, Simple and accurate dependency parsing using bidirectional lstm feature representations, 2016.

G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, and C. Dyer, Neural architectures for named entity recognition, 2016.

R. Mcdonald, J. Nivre, Y. Quirmbach-brundage, Y. Goldberg, D. Das et al., Universal dependency annotation for multilingual parsing, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, vol.2, pp.92-97, 2013.

C. Ramisch, S. R. Cordeiro, A. Savary, V. Vincze, A. Verginica-barbu-mititelu et al., Edition 1.1 of the parseme shared task on automatic identification of verbal multiword expressions, Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01865575

A. Savary, C. Ramisch, S. Cordeiro, F. Sangati, V. Vincze et al., The parseme shared task on automatic identification of verbal multiword expressions, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.31-47, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01865575

Y. Shao, C. Hardmeier, and J. Nivre, Multilingual named entity recognition using hybrid neural networks, The Sixth Swedish Language Technology Conference (SLTC), 2016.

M. Straka, J. Hajic, and J. Straková, Udpipe: Trainable pipeline for processing conll-u files performing tokenization, morphological analysis, pos tagging and parsing, Language Resources and Evaluation Conference, p.260, 2016.

, Proceedings of the Joint Workshop on , Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), pp.261-267

M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen et al., Xiaoqiang Zheng. 2015. TensorFlow: Large-scale machine learning on heterogeneous systems. Software available from tensorflow.org

A. Hazem, M. Saied, M. Constant, and . Candito, The atilf-llf system for parseme shared task: a transition-based verbal multiword expression tagger, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.127-132, 2017.

T. Baldwin and S. Kim, Multiword expressions. Handbook of natural language processing, vol.2, pp.267-292, 2010.

P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, Enriching word vectors with subword information, 2016.

T. Boros¸, S. Boros¸, . Pipa, D. Verginica-barbu-mititelu, . Tufis¸ et al., A data-driven approach to verbal multiword expression detection. parseme shared task system description paper, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.121-126, 2017.

J. Chiu and E. Nichols, Named entity recognition with bidirectional lstm-cnns, Transactions of the Association for Computational Linguistics, vol.4, pp.357-370, 2016.

F. Chollet, , 2015.

S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural computation, vol.9, issue.8, pp.1735-1780, 1997.

N. Klyueva, A. Doucet, and M. Straka, Neural networks for multi-word expression detection, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.60-65, 2017.

M. Labeau, K. Löser, and A. Allauzen, Non-lexical neural architecture for fine-grained pos tagging, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp.232-237, 2015.

A. Maldonado, L. Han, E. Moreau, A. Alsulaimani, K. D. Chowdhury et al., Detection of verbal multi-word expressions via conditional random fields with syntactic dependency features and semantic re-ranking, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.114-120, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01520762

L. Nerima, V. Foufi, and E. Wehrli, Parsing and mwe detection: Fips at the parseme shared task, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.54-59, 2017.

C. Ramisch, S. R. Cordeiro, A. Savary, V. Vincze, A. Verginica-barbu-mititelu et al., Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions, Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01865575

P. Radim?eh??ekradim?radim?eh??ek and . Sojka, Software Framework for Topic Modelling with Large Corpora, Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp.45-50, 2010.

R. Schäfer, F. Bildhauer-;-khalid, T. Choukri, . Declerck, U. Mehmet et al., Building large corpora from the web using a new efficient tool chain, Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC-2012), pp.12-1497, 2012.

K. I. Simkó, V. Kovács, and V. Vincze, Uszeged: Identifying verbal multiword expressions with pos tagging and parsing techniques, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.48-53, 2017.

D. Chen and C. Manning, A fast and accurate dependency parser using neural networks, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.740-750, 2014.

F. Chollet, , 2015.

M. Constant and J. Nivre, A transition-based system for joint lexical and syntactic analysis, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol.1, pp.161-171, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01808689

K. Rong-en-fan, C. Chang, X. Hsieh, C. Wang, and . Lin, Liblinear: A library for large linear classification, Journal of machine learning research, vol.9, pp.1871-1874, 2008.

N. Klyueva, A. Doucet, and M. Straka, Neural networks for multi-word expression detection, Proceedings of the 13th Workshop on Multiword Expressions, pp.60-65, 2017.

S. Kubler, R. Mcdonald, J. Nivre, and G. Hirst, Dependency Parsing, 2009.

A. Maldonado, L. Han, E. Moreau, A. Alsulaimani, K. D. Chowdhury et al., Detection of verbal multi-word expressions via conditional random fields with syntactic dependency features and semantic re-ranking, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.114-120, 2017.
DOI : 10.18653/v1/w17-1715

URL : https://hal.archives-ouvertes.fr/hal-01520762

J. Nivre, Incrementality in deterministic dependency parsing, Proceedings of the ACL Workshop Incremental Parsing: Bringing Engineering and Cognition Together, pp.50-57, 2004.
DOI : 10.3115/1613148.1613156

URL : http://dl.acm.org/ft_gateway.cfm?id=1613156&type=pdf

S. Poria, E. Cambria, and A. Gelbukh, Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp.2539-2544, 2015.
DOI : 10.18653/v1/d15-1303

URL : https://doi.org/10.18653/v1/d15-1303

B. Qasemizadeh and L. Kallmeyer, Random positive-only projections: Ppmi-enabled incremental semantic space construction, Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics, pp.189-198, 2016.

B. Qasemizadeh and L. Kallmeyer, HHU at SemEval-2017 task 2: Fast hash-based embeddings for semantic word similarity assessment, Proceedings of the 11th International Workshop on Semantic Evaluation, 2017.

B. Qasemizadeh, L. Kallmeyer, and P. Passban, Sketching word vectors through hashing, 2017.

C. Ramisch, S. R. Cordeiro, A. Savary, V. Vincze, A. Verginica-barbu-mititelu et al., Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions, Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01865575

A. Sharif-razavian, H. Azizpour, J. Sullivan, and S. Carlsson, Cnn features off-the-shelf: An astounding baseline for recognition, Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPRW '14, pp.512-519, 2014.

A. Hazem, M. Saied, M. Constant, and . Candito, The ATILF-LLF system for parseme shared task: a transition-based verbal multiword expression tagger, Proceedings of the 13th Workshop on Multiword Expressions, pp.127-132, 2017.

A. Savary, C. Ramisch, S. Cordeiro, F. Sangati, V. Vincze et al., The parseme shared task on automatic identification of verbal multiword expressions, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.31-47, 2017.
DOI : 10.18653/v1/w17-1704

URL : https://hal.archives-ouvertes.fr/hal-01865575

?. Sibmwe and P. ,

?. Siblem and P. ,

?. Binlem, unordered pair {(l v , m v )

?. Parmwedep,

?. Sibmwedep,

?. Parlemposdep,

?. Siblempos,

?. Parposlemdep,

?. Sibposlem,

, We used the above set of templates consistently for all languages and for all VMWE categories, except for Lithuanian-dependency trees were not available for this language. We therefore converted each sentence in the LT dataset to a pseudo-dependency tree in which (i) the first token is the root, (ii) every other token is the child of the preceding token, thus obtaining a model equivalent to a 2-order sequential CRF. We also adapted the default set of templates to Lithuanian by replacing the sibling templates with selected grandparent-related templates

E. N. De, E. S. Hi, and P. L. Hr, The pre-processing method most often applied, case lifting, consisted in reattaching case dependents to their grandparents so as to make MWEs of certain categories-notably, inherently adpositional verbs-connected. 4 We applied it to BG, case of Slovak, we relied on language-specific POS tags rather than universal tags

, 66.96 1/11 6.42 81.85 59.03 68.59 1/11 3.67 PT 4430 553 553 95, vol.54

, Table 1: Detailed results of TRAVERSAL for 19 languages (identified by their ISO 639-1 codes) that tokens with an unspecified dependency head are attached to the artificial root node (with ID=0). The same pre-processing steps were applied to TRAIN, DEV, and (blind) TEST data

, Segmentation Once the labeling of a given dependency tree is determined, we need to determine the boundaries of the detected MWEs. To this end, we considered two heuristics: (i) all adjacent nodes marked as MWEs of the same category are considered as a single MWE occurrence, and (ii) if a group of adjacent nodes is marked as MWEs but it contains two

. Mwes, We applied the first heuristic for all languages except Farsi, where the second heuristic yielded better results, notably due to a relatively high frequency of neighboring MWEs in the FA dataset

, For each language, the MWE-based and token-based precision (P), recall (R), and F 1 (F1) scores are reported, as well as the rank (Rank) of our system, and the difference (Delta) between the TRAVERSAL's F 1 score and the score of the other best closed-track system. The datasets with dependencies annotated manually, partially manually, or not at all, are marked with , or ? , respectively. For the other datasets/sentences, dependencies were obtained automatically. Con is the % of connected (via parental or sibling relation) VMWEs in the TRAIN+DEV dataset (no value =? Con=Con p ), and Con p is the same measure after pre-processing. Finally, Iso p is the % of connected and isolated (with no adjacent VMWEs of the same category) VMWEs after pre-processing, for which the baseline segmentation heuristic is sufficient. Language-wise, our system performed particularly well for Slavic and Romance languages, which is likely related to our choice of Polish and French for feature template engineering. FA was the most References Anne Abeillé and Yves Schabes. 1989. Parsing Idioms in Lexicalized TAGs, according to both official evaluation measures: MWE-based F 1 and token-based F 1. Table 1 summarizes the performance of our system across 19 languages of the shared task (all except Arabic), pp.1-9

E. Bej?ek, J. Panevová, J. Popelka, P. Stra?ák, J. Magda?ev?íkovámagda?magda?ev?íková et al., Prague dependency treebank 2.5-a revisited version of pdt 2.0, Proceedings of the 24th International Conference on Computational Linguistics, pp.231-246, 2012.

M. Candito and M. Constant, Strategies for Contiguous Multiword Expression Analysis and Dependency Parsing, Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, vol.1, pp.743-753, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01022415

M. Constant and J. Nivre, A Transition-Based System for Joint Lexical and Syntactic Analysis, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol.1, pp.161-171, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01808689

M. Constant, A. Sigogne, and P. Watrin, Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing, Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, vol.1, pp.204-212, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00790613

G. Gallo, G. Longo, S. Pallottino, and S. Nguyen, Directed Hypergraphs and Applications, Discrete Appl. Math, vol.42, issue.2-3, pp.177-201, 1993.

S. Green, M. De-marneffe, and C. D. Manning, Parsing Models for Identifying Multiword Expressions, Computational Linguistics, issue.1, p.39, 2013.

K. Aravind, Y. Joshi, and . Schabes, Tree-Adjoining Grammars, Grzegorz Rozenberg and Arto Salomaa, pp.69-123, 1997.

D. Klein and C. D. Manning, Parsing and Hypergraphs, Seventh International Workshop on Parsing Technologies (IWPT-2001), 2001.

S. Kübler, R. Mcdonald, and J. Nivre, Dependency parsing, Synthesis Lectures on Human Language Technologies, vol.1, issue.1, pp.1-127, 2009.

J. Le-roux, A. Rozenknop, and M. Constant, Syntactic Parsing and Compound Recognition via Dual Decomposition: Application to French, Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp.1875-1885, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01074298

A. Maldonado, L. Han, E. Moreau, A. Alsulaimani, K. D. Chowdhury et al., Detection of Verbal Multi-Word Expressions via Conditional Random Fields with Syntactic Dependency Features and Semantic Re-Ranking, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.114-120, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01520762

R. Mcdonald and F. Pereira, Online learning of approximate dependency parsing algorithms, 11th Conference of the European Chapter, 2006.

R. Mcdonald and G. Satta, On the complexity of non-projective data-driven dependency parsing, Proceedings of the Tenth International Conference on Parsing Technologies, pp.121-132, 2007.

I. Nagy, T. , and V. Vincze, VPCTagger: Detecting Verb-Particle Constructions With SyntaxBased Methods, Proceedings of the 10th Workshop on Multiword Expressions (MWE), pp.17-25, 2014.

A. Nasr, C. Ramisch, J. Deulofeu, and A. Valli, Joint Dependency Parsing and Multiword Expression Tokenization, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol.1, pp.1116-1126, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01199865

. Ning-qian, On the momentum term in gradient descent learning algorithms, Neural networks, vol.12, issue.1, pp.145-151, 1999.

C. Ramisch, S. R. Cordeiro, A. Savary, V. Vincze, A. Verginica-barbu-mititelu et al., Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions, Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions, 2018.
URL : https://hal.archives-ouvertes.fr/hal-02152557

S. Ruder, An overview of gradient descent optimization algorithms, 2016.

M. Scholivet and C. Ramisch, Identification of Ambiguous Multiword Expressions Using Sequence Models and Lexical Resources, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.167-175, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01795903

C. Sutton and A. Mccallum, An introduction to conditional random fields. Foundations and Trends R in Machine Learning, vol.4, pp.267-373, 2012.

V. Vincze, I. Nagy, T. , and J. Zsibrita, Learning to Detect English and Hungarian Light Verb Constructions, ACM Trans. Speech Lang. Process, vol.10, issue.2, pp.1-6, 2013.

J. Waszczuk, A. Savary, and Y. Parmentier, Promoting multiword expressions in A* TAG parsing, Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp.429-439, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01378903

K. Baldwin, S. Baldwin, and . Kim, Multiword expressions, Handbook of Natural Language Processing, pp.267-292, 2010.

G. Mathieu-constant, J. Eryi?-git, L. Monti, C. Van-der-plas, M. Ramisch et al., Multiword expression processing: A survey, Computational Linguistics, vol.43, issue.4, pp.837-892, 2017.

[. Pasquer, If you've seen some, you've seen them all: Identifying variants of multiword expressions, Proceedings of COLING 2018, the 27th International Conference on Computational Linguistics. The COLING 2018 Organizing Committee, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01866345

. Schneider, of the VMWE, and not its span, thus the need for a special tag 'g' to indicate intermediate tokens. During system development, one of our goals was to evaluate different tagging schemes and choose the best one based on the development corpus performances. Therefore, in addition to the extended BIO scheme, we also tested an adaptation that includes category labels (BIO+cat). 'B' and 'I' tags are thus concatenated with the provided VMWE's category labels (IRV, LVC.full, VID, etc). The idea is that categories present quite heterogeneous characteristics, so it may be a good idea to model/learn them separately in the neural network. This is illustrated in the last row of Figure 1. Finally, We have also evaluated our system using an inside-outside scheme similar to the one used in MUMULs, We use CoNLL-U's LEMMA and UPOS fields as input features (falling back to FORM and XPOS, respectively, if the former are absent). 3 Each token's LEMMA and UPOS are converted into one-hot vectors, which are then transformed into embeddings and concatenated. Input LEMMA and UPOS embeddings are pre-initialized on the shared task training corpora, but fine-tuned during the training phase. These embeddings are then forwarded to a double bidirectional recurrent layer using gated recurrent units (GRU). Finally, each BIO label prediction is based on a softmax layer that takes as input the concatenation of the GRU cell outputs in both directions for each token, 2014.

M. Constant and A. Sigogne, MWU-aware part-of-speech tagging with a CRF model and lexical resources, Proceedings of the ALC Workshop on Multiword Expressions: From Parsing and Generation to the Real World, pp.49-56, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00621585

G. Mathieu-constant, J. Eryi?-git, L. Monti, C. Van-der-plas, M. Ramisch et al., Multiword expression processing: A survey, Computational Linguistics, vol.43, issue.4, pp.837-892, 2017.

N. Klyueva, A. Doucet, and M. Straka, Neural networks for multi-word expression detection, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.60-65, 2017.

A. Maldonado, L. Han, E. Moreau, A. Alsulaimani, K. D. Chowdhury et al., Detection of verbal multi-word expressions via conditional random fields with syntactic dependency features and semantic re-ranking, Proceedings of the 13th Workshop on Multiword Expressions , MWE '17, pp.114-120, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01520762

C. Ramisch, S. R. Cordeiro, A. Savary, V. Vincze, A. Verginica-barbu-mititelu et al., Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions, Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions, 2018.
URL : https://hal.archives-ouvertes.fr/hal-02152557

L. Ramshaw and M. Marcus, Text chunking using transformation-based learning, 3rd Workshop on Very Large Corpora, pp.82-94, 1995.

M. Riedl and C. Biemann, Impact of MWE resources on multiword recognition, Proceedings of the 12th Workshop on Multiword Expressions , MWE '16, pp.107-111, 2016.

A. Savary, C. Ramisch, S. Cordeiro, F. Sangati, V. Vincze et al., The PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.31-47, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01865575

N. Schneider, E. Danchik, C. Dyer, and N. A. Smith, Discriminative lexical semantic segmentation with gaps: Running the MWE gamut, Transactions of the Association for Computational Linguistics, vol.2, issue.1, pp.193-206, 2014.

M. Scholivet and C. Ramisch, Identification of ambiguous multiword expressions using sequence models and lexical resources, Proceedings of the 13th Workshop on Multiword Expressions, MWE '17, pp.167-175, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01795903

M. Scholivet, C. Ramisch, and B. Favre, Identification d'expressions polylexicales avec réseaux de neurones récurrents, Traitement Automatique des Langues, 2018.