Lexical-Functional Syntax. Blackwell Textbooks in Linguistics, 2015. ,
On understanding idiomatic language: The salience hypothesis assessed by ERPs, Brain Research, vol.1068, issue.1, pp.151-160, 2006. ,
URL : https://hal.archives-ouvertes.fr/hal-01440644
, Modern Information Retrieval, 1999.
The WaCky Wide Web: A collection of very large linguistically processed Web-crawled corpora, Journal of Language Resources and Evaluation, vol.43, pp.209-226, 2009. ,
Effects of writing systems on second language awareness: Word awareness in English learners of Chinese as a foreign language, Second Language Writing Systems. Multilingual Matters, pp.335-356, 2005. ,
The Oxford Handbook of Construction Grammar, pp.255-273, 2013. ,
Neural Word Segmentation Learning for Chinese, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp.409-420, 2016. ,
The IdiomSearch Experiment: Extracting Phraseology from a Probabilistic Network of Constructions, pp.16-28, 2017. ,
Radical Construction Grammar: Syntactic Theory in Typological Perspective, 2001. ,
The Oxford Handbook of Construction Grammar, pp.211-232, 2013. ,
, 2002. Word: A Cross-Linguistic Typology
Inter-coder agreement for Computational Linguistics, Computational Linguistics, vol.34, issue.4, pp.555-596, 2008. ,
Multiword expressions, Handbook of Natural Language Processing, pp.267-292, 2010. ,
Pearson correlation coefficient, Noise reduction in speech processing, pp.1-4, 2009. ,
A survey of clustering data mining techniques, Grouping multidimensional data, pp.25-71, 2006. ,
Modern Greek comparative constructions: A syntactic analysis of adjectival and adverbial comparatives, Greek), 1986. ,
Distinguishing subtypes of multiword expressions using linguistically-motivated statistical measures, Proceedings of the Workshop on A Broader Perspective on Multiword Expressions, pp.9-16, 2007. ,
Unsupervised type and token identification of idiomatic expressions, Computational Linguistics, vol.35, issue.1, pp.61-103, 2009. ,
Understanding idiomatic variation, Proceedings of the 13th Workshop on Multiword Expressions, pp.80-90, 2017. ,
Similes and sets: The English preposition "like, Languages and Linguistics: Festschrift for Professor Fr. ?ermák. Philosophy Faculty of the Charles University, 2005. ,
On simile, Language, Culture and Mind, pp.123-135, 2004. ,
Local-global vectors to improve unigram terminology extraction, Proceedings of the 5th International Workshop on Computational Terminology (Computerm), pp.2-11, 2016. ,
Enhancing statistical machine translation with bilingual terminology in a CAT environment, Proceedings of the 11th Biennial Conference of the Association for Machine Translation in the Americas (AMTA), pp.54-68, 2014. ,
Inducing terminology for lexical acquisition, Proceedings of the 2nd Conference on Empirical Methods in Natural Lanaguge Processing (EMNLP), 1997. ,
Designing a Russian Idiom-Annotated Corpus, Proceedings of the Language Resources and Evaluation Conference (LREC), 2018. ,
Strategies for translating idioms, Journal of Academic and Applied Studies (Special Issue on Applied Linguistics, vol.3, issue.8, pp.32-41, 2013. ,
Morphologically Annotated Corpora and Morphological Analyzers for Moroccan and Sanaani Yemeni Arabic, Proceedings of the Language Resources and Evaluation Conference (LREC), 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01349201
Towards interlingual constructicography: On correspondence between constructicon resources for english and swedish, Constructions and Frames, vol.6, issue.1, pp.9-33, 2014. ,
, Other Words: A Coursebook on Translation. Routledge, 1992.
Abstract Meaning Representation for Sembanking, Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, pp.178-186, 2013. ,
Linking and extending an open multilingual wordnet, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, vol.1, pp.1352-1362, 2013. ,
Abstract Meaning Representation of Constructions: The More We Include, the Better the Representation, Proceedings of the 2018 Language Resources and Evaluation Conference (LREC), 2018. ,
The Multidialectal Parallel Corpus of Arabic, Proceedings of the Language Resources and Evaluation Conference (LREC), 2014. ,
The MADAR Arabic Dialect Corpus and Lexicon, Proceedings of the Language Resources and Evaluation Conference (LREC), 2018. ,
The Syntax of Spoken Arabic. Georgetown University Press. 13 We expect to release these resources to the research community, 2002. ,
, Cognitive Linguistics, 2004.
, Diacritization of Moroccan and Tunisian Arabic Dialects: A CRF Approach, Proceedings of the 3rd Workshop on OpenSource Arabic Corpora and Processing Tools, 2018.
The Case for Systematically Derived Spatial Language Usage, Proceedings of the NAACL 2018 Workshop on Spatial Language Understanding (SpLU), 2018. ,
Moroccan Arabic Verb Dictionary, 2011. ,
Unsupervised Type and Token Identification of Idiomatic Expressions, Computational Linguistics, pp.61-103, 2009. ,
, WordNet: An Electronic Lexical Database, 1998.
Idioms and Idiomaticity, p.5, 1996. ,
Regularity and idiomaticity in grammatical constructions: The case of let alone. Language, pp.501-538, 1988. ,
The Framenet Constructicon. Sign-based construction grammar, pp.309-372, 2012. ,
Border conflicts: Framenet meets construction grammar, Proceedings of the XIII EURALEX international congress, vol.4968, 2008. ,
From construction candidates to constructicon entries: An experiment using semi-automatic methods for identifying constructions in corpora, Constructions and Frames, vol.6, issue.1, pp.114-135, 2014. ,
Constructions: A construction grammar approach to argument structure, 1995. ,
Constructions: a new theoretical approach to language, TRENDS in Cognitive Sciences, vol.7, 2003. ,
, A Dictionary of Moroccan Arabic: Moroccan-English English-Moroccan, 1966.
Construction of an idiom coprus and its application to idiom identification based on wsd incorporating idiom-specific features, Proceedings of the Empirical Methods for Natural Language Processing Conference (EMNLP), 2008. ,
Building the Moroccan Darija Wordnet (MDW) using Bilingual Resources, Proceedings of the International Conference on Natural Language, Signal and Speech Processing, 2017. ,
Automatic Idiom Identification in Wiktionary, Proceedings of the Empirical Methods for Natural Language Processing Conference (EMNLP), 2013. ,
, Idioms. Language, vol.70, issue.3, pp.491-538, 1994.
Toward constructicon building for japanese in japanese framenet, Revista Veredas, issue.1, p.17, 2016. ,
Automatic Idiom Recognition with Word Embeddings, Proceedings of the Annual International Symposium on Information Management and Big Data, 2017. ,
A System for Identification of Idioms in Hindi, Seventh International Conference on Contemporary Computing (IC3), 2014. ,
Multiword Expressions: A Pain in the Neck for NLP?, Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing (CICLing'02), 2002. ,
An Arabic-Moroccan Darija Code-Switched Corpus, Proceedings of the Language Resources and Evaluation Conference (LREC), 2016. ,
Detecting Code-Switching in Moroccan Arabic Social Media, SocialNLP Workshop at International Joint Conference on Artificial Intelligence (IJCAI), 2016. ,
Translation of Idioms and Fixed Expressions: Strategies and Difficulties. Theory and Practice in Language Studies, vol.2, 2012. ,
Ethnologue: Languages of the World, Twenty-first edition, 2018. ,
Multilingual Spoken Language Corpus Development for Communication Research, International Journal of Computational Linguistics & Chinese Language Processing: Special Issue, vol.12, issue.3, pp.303-324, 2007. ,
Lexicalization patterns: Semantic structure in lexical forms. Language typology and syntactic description, vol.3, pp.36-149, 1985. ,
Toward a cognitive semantics, vol.2, 2000. ,
Revisiting border conflicts between FrameNet and Construction Grammar: Annotation policies for the Brazilian Portuguese Constructicon, Constructions and Frames, vol.6, issue.1, pp.34-51, 2014. ,
The Lexicon-Grammar of Italian Idioms, Proceedings of the International Conference on Computational Linguistics (COLING), 2014. ,
DOI : 10.3115/v1/w14-5817
URL : https://hal.archives-ouvertes.fr/hal-01414494
Finding Romanized Arabic Dialect in Code-Mixed Tweets, Proceedings of the Language Resources and Evaluation Conference (LREC), 2014. ,
Arabizi": A Contemporary Style of Arabic Slang, Design Issues, vol.24, issue.2, pp.39-52, 2008. ,
Machine Translation of Arabic Dialects, Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT '12, pp.49-59, 2012. ,
Automatic conversion of the Basque dependency treebank to universal dependencies, Proceedings of the Workshop on Treebanks and Linguistic Theories, pp.233-241, 2015. ,
Automatic morphological analysis of Basque, Literary and Linguistic Computing, vol.11, pp.193-203, 1996. ,
Representation and treatment of Multiword Expressions in Basque, Proceedings of the Workshop on Multiword Expressions: Integrating Processing, pp.48-55, 2004. ,
Manual de fraseología española, 1997. ,
Valency and argument structure in the Basque verb, 2003. ,
Automatic extraction of NV expressions in Basque: basic issues on cooccurrence techniques, Proceedings of the Workshop on Multiword Expressions: from parsing and generation to the real world, pp.2-7, 2011. ,
Combining different features of idiomaticity for the automatic classification of noun+verb expressions in Basque, Proceedings of the 9th Workshop on Multiword Expressions, pp.116-125, 2013. ,
Rule-based translation of Spanish Verb-Noun combinations into Basque, Proceedings of the 13th Workshop on Multiword Expressions, pp.149-154, 2017. ,
Analysing linguistic information about word combinations for a Spanish-Basque rule-based machine translation system, Multiword Units in Machine Translation and Translation Technologies, pp.39-60 ,
Orotariko Euskal Hiztegia. Euskaltzaindia, the Royal Academy of the Basque language, 1987. ,
A brief grammar of Euskera, the Basque language, 1996. ,
PARSEME survey on MWE resources, 9th International Conference on Language Resources and Evaluation (LREC 2016, pp.2299-2306, 2016. ,
Multiword expressions: a pain in the neck for NLP, International Conference on Intelligent Text Processing and Computational Linguistics, pp.1-15, 2002. ,
PARSEMEPARSing and Multiword Expressions within a European multilingual network, 7th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, 2015. ,
Fabienne Cap, Voula Giouli, Ivelina Stoyanova and others. 2017. The PARSEME Shared Task on automatic identification of Verbal Multiword Expressions, Proceedings of the 13th Workshop on Multiword Expressions, pp.31-47, 2017. ,
Edition 1.1 of the PARSEME Shared Task on automatic identification of Verbal Multiword Expressions, Proceedings of the 14th Workshop on Multiword Expressions, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-02152557
Euskal lokuzioen tratamendu konputazionala, 2012. ,
Los predicados complejos en vasco, Las fronteras de la composicin en lenguas romnicas y en vasco, pp.445-534, 2004. ,
On verbs and time. Doctoral dissertation, 1985. ,
Deep semantic analysis of text, Proc. of the 2008 Conference on Semantics in Text Processing, STEP '08, pp.343-354, 2008. ,
Towards a general theory of action and time, Artificial Intelligence, vol.23, pp.123-54, 1984. ,
The 'perfect' as a universal and as a language-specific category, Tense-aspect: Between semantics and pragmatics, pp.227-264, 1982. ,
The algebra of events, Linguistics and philosophy, vol.9, issue.1, pp.5-16, 1986. ,
Abstract Meaning Representation for sembanking, Proc. of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, pp.178-186, 2013. ,
Studies in Language. International Journal sponsored by the Foundation "Foundations of Language, vol.18, pp.23-44, 1994. ,
Annotating temporal and event quantification, Proc. of 5th ISA Workshop, 2010. ,
A dynamic model of aspectual composition, Proc. of CogSci, pp.226-231, 1998. ,
A motor-and image-schematic analysis of aspectual composition, 1997. ,
Aspect, 1976. ,
Tense, 1985. ,
Verbs: Aspect and Causal Structure, 2012. ,
The effects of aspectual class on the temporal structure of discourse: semantics or pragmatics?, Linguistics and Philosophy, vol.9, issue.1, pp.37-61, 1986. ,
Automatic prediction of aspectual class of verbs in context, Proc. of ACL, 2014. ,
Situation entity types: Automatic classification of clause-level aspect, Proc. of ACL, pp.1757-1768, 2016. ,
Temporal anaphora in discourses of English, Linguistics and philosophy, vol.9, issue.1, pp.63-82, 1986. ,
Time in language, 1994. ,
Remarks on English aspect, Tense-aspect: Between semantics and pragmatics, pp.265-304, 1982. ,
Foundations of cognitive grammar: Theoretical prerequisites, vol.1, 1987. ,
English verb classes and alternations: A preliminary investigation, 1993. ,
Annotating The Little Prince with Chinese AMRs, Proc. of LAW X-the 10th Linguistic Annotation Workshop, pp.7-15, 2016. ,
Supervised categorization for habitual versus episodic sentences, Sixth Midwest Computational Linguistics Colloquium, 2009. ,
Text generation and systemic-functional linguistics: Experiences from english and japanese, 1991. ,
Annotating Abstract Meaning Representations for Spanish, Stelios Piperidis, and Takenobu Tokunaga, pp.3074-3078, 2018. ,
Temporal ontology and temporal reference, Computational Linguistics, vol.14, issue.2, pp.15-28, 1988. ,
CaTeRS: Causal and temporal relation scheme for semantic annotation of event structures, Proc. of the Fourth Workshop on Events, pp.51-61, 2016. ,
Richer Event Description: Integrating event coreference with temporal, causal and bridging annotation, Proc. of the 2nd Workshop on Computing News Storylines, pp.47-56, 2016. ,
AMR beyond the sentence: the Multi-sentence AMR corpus, Proc. of COLING, 2018. ,
The Proposition Bank: An annotated corpus of semantic roles, Computational Linguistics, vol.31, issue.1, pp.71-106, 2005. ,
Mood and modality, 2001. ,
Nominal and temporal semantic structure: Aspect and quantification, vol.3, p.91, 1999. ,
The progressive in modal semantics, Language, vol.74, issue.4, p.760, 1998. ,
TimeML: Robust specification of event and temporal expressions in text, IWCS-5, Fifth International Workshop on Computational Semantics, 2003. ,
Designing annotation schemes: From theory to model, Handbook of Linguistic Annotation, pp.21-72, 2017. ,
ISO-Space: Annotating static and dynamic spatial information, Handbook of Linguistic Annotation, pp.989-1024, 2017. ,
Tense sense disambiguation: A new syntactic polysemy task, Proc. of the 2010 Conference on Empirical Methods in Natural Language Processing, pp.325-334, 2010. ,
Elements of symbolic logic, 1947. ,
Telicity, atomicity and the Vendler classification of verbs. Theoretical and Crosslinguistic Approaches to Aspect, pp.43-77, 2008. ,
Time with and without tense, Time and Modality, Studies in Natural Language and Linguistic Theory, pp.227-249, 2008. ,
Verbs and times, The Philosophical Review, vol.66, pp.143-60, 1957. ,
Not an interlingua, but close: comparison of English AMRs to Chinese and Czech, Proc. of LREC, pp.1765-1772, 2014. ,
Diagnosing Meaning Errors in Short Answers to Reading Comprehension Questions, Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications, pp.107-115, 2008. ,
Predictive incremental parsing and its evaluation, Computational Dependency Theory, vol.258, pp.186-206, 2013. ,
EAGLE: an Error-Annotated Corpus of Beginning Learner German, Proceedings of the International Conference on Language Resources and Evaluation, 2010. ,
Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, vol.1, pp.793-805, 2017. ,
A coefficient of agreement for nominal scales, Educational and Psychological Measurement, vol.20, issue.1, pp.37-46, 1960. ,
Vater und Sohn" im Anfängerunterricht: Eine Hörverstehensübung und ein Schreibauftrag. Fremdsprache Deutsch: Zeitschrift für die Praxis des Deutschunterrichts, vol.5, pp.46-47, 1991. ,
Building a Large Annotated Corpus of Learner English: The NUS Corpus of Learner English, Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications, pp.22-31, 2013. ,
Dependency Annotation for Learner Corpora, Proceedings of the Eighth Workshop on Treebanks and Linguistic Theories (TLT-8), pp.59-70, 2009. ,
Towards interlanguage POS annotation for effective learner corpora in SLA and FLT, Special Issue on Corpus Linguistics for Teaching and Learning, vol.36, pp.139-154, 2010. ,
Deutsch mit Vater und Sohn: 10 Bildgeschichten von E. O. Plauen für den Unterricht Deutsch als Fremdsprache, 2001. ,
The Montclair Electronic Language Database Project, Language and Computers, Applied Corpus Linguistics. A Multidimensional Perspective, 2004. ,
Because Size Does Matter: The Hamburg Dependency Treebank, Proceedings of the Language Resources and Evaluation Conference, 2014. ,
Eine umfassende Constraint-Dependenz-Grammatik des Deutschen. Fachbereich Informatik, Hamburg. URN: urn:nbn:de:gbv, pp.18-228, 2006. ,
International Corpus of Learner English. Version 2. Handbook and CD-Rom. Presses universitaires de Louvain, 2009. ,
Far far away in far rockaway: Responses to risks and impacts during hurricane sandy through first-person social media narratives, Proceedings of the Information Systems for Crisis Response and Management (ISCRAM) Conference, 2016. ,
Assessing state-of-the-art sentiment models on state-of-the-art sentiment datasets, Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp.2-12, 2017. ,
The tornado warning process: A review of current research, challenges, and opportunities, Bulletin of the American Meteorological Society, vol.94, issue.11, pp.1715-1733, 2013. ,
sometimes da #beachlife ain't always da wave": Understanding people's evolving risk assessments and responses during hurricane sandy using twitter, 2018. ,
Participant" perceptions of twitter research ethics, Social Media + Society, vol.4, issue.1, 2018. ,
The limits of crisis data: analytical and ethical challenges of using social and mobile data to understand disasters, GeoJournal, vol.80, issue.4, pp.491-502, 2015. ,
Evacuation decision making and behavioral responses: Individual and household, Natural Hazards Review, vol.8, issue.3, pp.69-77, 2007. ,
Conversations in the eye of the storm: At-scale features of conversational structure in a high-tempo, high-stakes microblogging environment, Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, CHI '18, vol.84, pp.1-84, 2018. ,
Information as intervention: How can hurricane risk communication reduce vulnerability?, 2017. ,
The protective action decision model: Theoretical modifications and additional evidence, Risk Analysis, vol.32, issue.4, pp.616-632, 2012. ,
Communication of Emergency Public Warnings: A Social Science Perspective and State-of-the-ART Assessment. Oak Ridge National Laboratory Rep, 1990. ,
Hazardous weather prediction and communication in the modern information environment, Bulletin of the American Meteorological Society, vol.98, issue.12, pp.2653-2674, 2017. ,
Crisis informatics: New data for extraordinary times, Science, vol.353, issue.6296, pp.224-225, 2016. ,
Social and hydrological responses to extreme precipitations: An interdisciplinary strategy for postflood investigation, vol.6, pp.135-153, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00921452
, , 2011.
Building a semantically annotated corpus of clinical texts, Journal of Biomedical Informatics, vol.42, issue.5, pp.950-966, 2009. ,
Bracketing guidelines for Treebank II Style Penn Treebank project, 1995. ,
, The Stanford CoreNLP Natural Language Processing Toolkit. Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp.55-60, 2014.
Towards comprehensive syntactic and semantic annotations of the clinical narrative, Journal of the American Medical Informatics Association, vol.20, issue.5, pp.922-930, 2013. ,
Self-trained biomedical parsing, 2009. ,
, , 2018.
Clinical Corpus Annotation: Challenges and Strategies, Proceedings of the Third Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM'2012) under LREC-2012, 2012. ,
Stockholm EPR Corpus: A Clinical Database Used to Improve Health Care, Proceedings of Swedish Language Technology Conference, pp.17-18, 2012. ,
Improving Performance of Natural Language Processing Part-of-Speech Tagging on Clinical Narratives through Domain Adaptation, Journal of the American Medical Informatics Association, vol.20, pp.931-939, 2013. ,
Joint Parsing and Named Entity Recognition, Proceedings of Human Language Technology: 2009 Conference of the North American Chapter of the Association of Computational Linguistics, pp.326-334, 2009. ,
GENIA corpus-A semantically annotated corpus for bio-text mining, Bioinformatics, vol.19, issue.1, pp.180-182, 2003. ,
GENIA Corpus Manual-Encoding schemes for the corpus and annotation, 2006. ,
Syntactic parsing of clinical text: guideline and corpus development with handling ill-formed sentences, Journal of the American Medical Informatics Association, vol.20, issue.6, pp.1168-1177, 2013. ,
Corpus design for biomedical natural language processing, Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics, pp.38-45, 2005. ,
Parsing clinical text: how good are the state-of-the-art parsers?, BMC Medical Informatics and Decision Making, vol.15, issue.1, p.2, 2015. ,
Building a Large Annotated Corpus of English: The Penn Treebank, Computational Linguistics, vol.19, pp.313-330, 1993. ,
Annotating a Large Representative Corpus of Clinical Notes for Parts of Speech, Proceedings of 8th Linguistic Annotation Workshop, pp.87-92, 2014. ,
Aspects of the theory of syntax, 1965. ,
Lectures on government and binding: the Pisa lectures, 1981. ,
The minimalist program, 1995. ,
Building a semantically annotated corpus for congestive heart and renal failure from clinical records and the literature, Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis, pp.69-74, 2014. ,
ezDI: A supervised NLP system for clinical narrative analysis, Proceedings of the 9th International Workshop on Semantic Evaluation, pp.412-416, 2015. ,
Evaluating syntax performance of parser/grammars, Proceedings of the Natural Language Processing Systems Evaluation Workshop, 1991. ,
Constructing Evaluation Corpora for Automated Clinical Named Entity Recognition, Proceedings of the 12th World Congress on Health, pp.3143-3150, 2007. ,
Building a Text Corpus for Representing the Variety of Medical Language, Studies in health technology and informatics, vol.84, issue.1, pp.290-294, 2001. ,
Developing a corpus of clinical notes manually annotated for part-of-speech, International Journal of Medical Informatics, vol.75, issue.6, pp.418-429, 2006. ,
Learning accurate, compact, and interpretable tree annotation, Proceedings of the 21st International conference on computational linguistics and the 44th annual meeting of the Association for Computational Linguistics, pp.443-440, 2006. ,
DOI : 10.3115/1220175.1220230
URL : http://dl.acm.org/ft_gateway.cfm?id=1220230&type=pdf
Unified Medical Language System term occurrences in clinical notes: a large-scale corpus analysis, Journal of the American Medical Informatics Association, vol.19, issue.e1, pp.149-156, 2012. ,
Clustering clinical trials with similar eligibility criteria features, Journal of Biomedical Informatics, vol.52, pp.112-120, 2014. ,
DOI : 10.1016/j.jbi.2014.01.009
URL : https://doi.org/10.1016/j.jbi.2014.01.009
The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes, Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing, pp.38-45, 2008. ,
Anaphoric reference in clinical reports: Characteristics of an annotated corpus, Journal of Biomedical Informatics, vol.45, issue.3, pp.507-521, 2012. ,
DOI : 10.1016/j.jbi.2012.01.010
URL : https://doi.org/10.1016/j.jbi.2012.01.010
Temporal Annotation in the Clinical Domain, Transactions of the Association for Computational Linguistics, vol.2, pp.143-154, 2012. ,
Annotating and recognising named entities in clinical notes, Proceedings of the ACL-IJCNLP 2009 Student Research Workshop, pp.18-26, 2009. ,
DOI : 10.3115/1667884.1667888
URL : http://dl.acm.org/ft_gateway.cfm?id=1667888&type=pdf
Syntax annotation for the GENIA corpus, Companion Volume to the Proceedings of Second international joint conference on natural language processing, pp.220-225, 2005. ,
Part-of-Speech Annotation of Biology Research Abstracts, Proceedings of 4th International Conference on Language Resources and Evaluation (LREC, pp.1267-1270, 2004. ,
Ckylark: A more robust PCFG-LA parser, Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics, pp.41-45, 2015. ,
DOI : 10.3115/v1/n15-3009
URL : https://doi.org/10.3115/v1/n15-3009
Simultaneous Identification of Biomedical Named-Entity and Functional Relations Using Statistical Parsing Techniques, Proceedings of Human Language Technology: 2007 Conference of the North American Chapter of the Association of Computational Linguistics, pp.161-164, 2007. ,
DOI : 10.3115/1614108.1614149
URL : http://dl.acm.org/ft_gateway.cfm?id=1614149&type=pdf
Discourse Segmentation for Building a RST Chinese Treebank, Proceedings of the 6th Workshop Recent Advances in RST and Related Formalisms, pp.73-81, 2017. ,
A Corpus-based Approach for Spanish-Chinese Language Learning, Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (NLP-TEA3), pp.97-106, 2016. ,
An analysis of the Concession relation based on the Spanish discourse marker aunque in a Spanish-Chinese parallel corpus, Procesamiento del Lenguaje Natural, vol.56, pp.81-88, 2016. ,
Toward the Elaboration of a Spanish-Chinese Parallel Annotated Corpus, EPiC Series of Language and Linguistics, vol.2, pp.315-324, 2017. ,
Using Discourse Information for Education with a Spanish-Chinese Parallel Corpus, Proceedings of the 11th edition of the Language Resources and Evaluation Conference (LREC'2018, pp.2254-2261, 2018. ,
Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory, Proceedings of the 2nd SIGDIAL Workshop on Discourse Dialogue, pp.1-10, 2001. ,
Comparing Structures of Essays in Chinese and English, 1985. ,
A Symbolic Corpus-based Approach to Detect and Solve the Ambiguity of Discourse Markers, Research in Computing Science, vol.70, pp.95-106, 2013. ,
Comparing rhetorical structures of different languages: The influence of translation strategies, Discourse Studies, vol.12, issue.5, pp.563-598, 2010. ,
DiSeg 1.0: The First System for Spanish Discourse Segmentation, Expert Systems with Applications (ESWA), vol.39, issue.2, pp.1671-1678, 2012. ,
On the Development of the RST Spanish Treebank, Proceedings of the 5th Linguistic Annotation Workshop, pp.1-10, 2011. ,
,
,
The RST Spanish Treebank On-line Interface, Proceedings of Recent Advances in Natural Language Processing, pp.698-703, 2011. ,
On the Role of Discourse Markers for Discriming Claims and Premises in Argumentative Discourse, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp.2236-2242, 2015. ,
Linearity in Rhetorical Organisation: A Comparative Cross-cultural Analysis of Newstext from the People's Republic of China and Australia, International Journal of Applied Linguistics, vol.10, issue.2, pp.241-58, 2000. ,
What Are They Getting At? Placement of Important Ideas in Chinese Newstext: A Contrastive Analysis with Australian Newstext, Australian Review of Applied Linguistics, vol.24, issue.2, pp.17-34, 2001. ,
Toward a 'Science' of Corpus Annotation: A New Methodology Challenges for Corpus Linguistics, International Journal of Translation, vol.22, issue.1, pp.13-36, 2010. ,
Deliberation as Genre: Mapping Argumentation through Relational Discourse Structure, Proceedings of the 6th Workshop Recent Advances and Related Formalisms, pp.1-10, 2017. ,
The RST Basque TreeBank: an online search interface to check rhetorical relations, Proceedings of IV Workshop A RST e os Estudos do Texto, pp.40-49, 2013. ,
The annotation of the Central Unit in Rhetorical Structure Trees: A Key Step in Annotating Rhetorical Relations, Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp.466-475, 2014. ,
A Qualitative Comparison Method for Rhetorical Structures: Identifying different discourse structures in multilingual corpora. Language resources and evaluation, vol.49, pp.263-309, 2015. ,
Detecting the central units in two different genres and languages: a preliminary study of Brazilian Portuguese and Basque texts, Procesamiento de Lenguaje Natural, vol.56, pp.65-72, 2016. ,
Elementary Discourse Unit in Chinese Dsicourse Structure Analysis, Chinese Lexical Semantics, vol.7717, pp.186-198, 2012. ,
Rhetorical Structure Theory: Toward a functional theory of text organization, Text&Talk, vol.8, issue.3, pp.243-281, 1988. ,
The rhetorical parsing of unrestricted texts: A surface-based approach, Computational Linguistics, vol.26, issue.3, pp.395-448, 2000. ,
RSTTool 2.4-A Markup Tool For Rhetorical Structure Theory, Proceedings of First International Conference on Natural Language Generation (INLG'2000), pp.253-256, 2000. ,
Software vai melhorar compreensão de textos em computadores, 2005. ,
Dizer: An Automatic Discourse Analyzer for Brazilian Portuguese, Lecture Notes in Artificial Intelligence, vol.3171, pp.224-234, 2008. ,
Rhetalho: um corpus dereferência anotado retoricamente. Anais do V Encontro de Corpora, 2005. ,
The Penn Discourse Treebank 2.0, Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC'2008), pp.2961-2968, 2008. ,
, Jiyu xiucijiegoulilun de hanyuxinwenpinglun yupianjiegou yanjiu, 2010.
United Nations general assembly resolutions: A six-languages parallel corpus, Proceedings of Machine Translation Summit XII, pp.292-299, 2009. ,
The Bible as a Parallel Corpus: Annotating the 'Book of, Computers and the Humanities, vol.33, issue.1-2, pp.129-153, 1999. ,
Potsdam Commentary Corpus 2.0: Annotation for Discourse Research, Proceedings of the International Conference on Language Resources and Evaluation (LREC'2014), pp.925-929, 2014. ,
Discourse Relations Reference Corpus, 2008. ,
Rhetorical relation markers in Russian RST Treebank, Proceedings of 6th Workshop Recent Advances in RST and Related Formalisms, pp.29-33, 2017. ,
MACROSTRUCTURES: AnInterdisciplinary Study of Global Structures in Discourse, Interaction, and Cognition, 1980. ,
Mircoblogs as Parallel Corpora, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL' 2013), pp.176-186, 2013. ,
On Application of computer-based corpora in translation, Proceedings of 2nd International Conference on Computer, Electrical, and Systems Sciences, and Engineering (CESSE' 2014), pp.173-178, 2014. ,
Discursive Usage of Six Chinese Punctuation Marks, Proceedings of the COLING/ACL 2006 Student Research Workshop, pp.43-48, 2006. ,
The CUHK Discourse Treebank for Chinese: Annotating Explicit Discourse Connectives for the Chinese Treebank, Proceedings of the International Conference on Language Resources and Evaluation (LREC'2014), pp.942-949, 2014. ,
rstWeb-A Browser-based Annotation Interface for Rhetorical Structure Theory and Discourse Relations, Proceedings of NAACL-HLT 2016 System Demonstrations, pp.1-5, 2016. ,
Bracketing guidelines for Treebank II style, 1995. ,
, English Web Treebank. LDC2012T13, Linguistic Data Consortium, 2012.
Robust constituent-to-dependency conversion for English, Proceedings of the 9th International Workshop on Treebanks and Linguistic Theories (TLT 2010), pp.55-66, 2010. ,
Stanford typed dependencies manual, 2013. ,
Information structure in cross-linguistic corpora: Annotation guidelines for phonology, morphology, syntax, semantics, and information structure, Interdisciplinary Studies on Information Structure, p.7, 2007. ,
A hybrid grammatical tagger: CLAWS4, Corpus Annotation: Linguistic Information from Computer Text Corpora, pp.102-121, 1997. ,
SPAAC speech-act annotation scheme, 2003. ,
Rhetorical Structure Theory: Toward a functional theory of text organization, Text, vol.8, issue.3, pp.243-281, 1988. ,
The Stanford CoreNLP natural language processing toolkit, Proceedings of ACL 2014: System Demonstrations, pp.55-60, 2014. ,
Normunds Gr¯ uz¯ ?tis, Linh Hà M?, Dag Haug, Barbora Hladká, Petter Hohle, Radu Ion, Elena Irimia ,
, Reference guide for the British National Corpus, 2007.
The VNC-Tokens dataset, Proceedings of the LREC workshop towards a shared task for Multiword Expressions, pp.19-22, 2008. ,
Literal or idiomatic? Identifying the reading of single occurrences of German multiword expressions using word embeddings, Proceedings of the Student Research Workshop at the 15th Conference of the European Chapter of the Association for Computational Linguistics, pp.103-112, 2017. ,
Unsupervised type and token identification of idiomatic expressions, Computational Linguistics, vol.35, issue.1, pp.61-103, 2009. ,
Introducing and evaluating ukWaC, a very large web-derived corpus of English, Proceedings of LREC, pp.47-54, 2008. ,
A word embedding approach to identifying verbnoun idiomatic combinations, Proceedings of the 12th Workshop on Multiword Expressions, pp.112-118, 2016. ,
A challenge set approach to evaluating machine translation, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.2476-2486, 2017. ,
SemEval-2013 task 5: Evaluating phrasal semantics, Proceedings of SemEval, pp.39-47, 2013. ,
Phrasal substitution of idiomatic expressions, Proceedings of NAACL, pp.363-373, 2016. ,
GloVe: Global vectors for word representation, Proceedings of EMNLP 2014, pp.1532-1543, 2014. ,
Multiword expressions: A pain in the neck for NLP, Proceedings of CICLING, pp.1-15, 2002. ,
An empirical study of the impact of idioms on phrase based statistical machine translation of English to Brazilian-Portuguese, Proceedings of the 3rd Workshop on Hybrid Approaches to Translation (HyTra), pp.36-41, 2014. ,
Evaluation of a substitution method for idiom transformation in statistical machine translation, Proceedings of the 10th Workshop on Multiword Expressions, pp.38-42, 2014. ,
Unsupervised recognition of literal and non-literal use of idiomatic expressions, Proceedings of EACL, pp.754-762, 2009. ,
Idioms in context: The IDIX corpus, Proceedings of LREC, pp.639-646, 2010. ,
The role of idioms in sentiment analysis, Expert Systems with Applications, vol.42, issue.21, pp.7375-7385, 2015. ,
An empirical model of multiword expression decomposability, Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, pp.89-96, 2003. ,
Multiword expressions. Handbook of natural language processing, vol.2, pp.267-292, 2010. ,
Acquiring phrasal lexicons from corpora, 2006. ,
Large language models in machine translation, Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2007. ,
Task-based evaluation of multiword expressions: a pilot study in statistical machine translation, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp.242-245, 2010. ,
Reranking answers for definitional qa using language modeling, Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pp.1081-1088, 2006. ,
DOI : 10.3115/1220175.1220311
URL : http://dl.acm.org/ft_gateway.cfm?id=1220311&type=pdf
Learning phrase representations using rnn encoder-decoder for statistical machine translation, 2014. ,
DOI : 10.3115/v1/d14-1179
URL : https://hal.archives-ouvertes.fr/hal-01433235
Discriminative syntactic language modeling for speech recognition, Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp.507-514, 2005. ,
DOI : 10.3115/1219840.1219903
URL : http://dl.acm.org/ft_gateway.cfm?id=1219903&type=pdf
Stock market prediction with deep learning: A character-based neural language model for event-based trading, Proceedings of the Australasian Language Technology Association Workshop, pp.6-15, 2017. ,
Deep learning models for multiword expression identification, Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017), pp.54-64, 2017. ,
DOI : 10.18653/v1/s17-1006
URL : https://doi.org/10.18653/v1/s17-1006
Long short-term memory, Neural computation, vol.9, issue.8, pp.1735-1780, 1997. ,
Automatic identification of non-compositional multi-word expressions using latent semantic analysis, Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties, pp.12-19, 2006. ,
DOI : 10.3115/1613692.1613696
URL : http://dl.acm.org/ft_gateway.cfm?id=1613696&type=pdf
How to pick out token instances of english verb-particle constructions, Language Resources and Evaluation, vol.44, issue.1-2, pp.97-113, 2010. ,
Can recognising multiword expressions improve shallow parsing?, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp.636-644, 2010. ,
Finding function in form: Compositional character models for open vocabulary word representation, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp.1520-1530, 2015. ,
DOI : 10.18653/v1/d15-1176
URL : https://doi.org/10.18653/v1/d15-1176
Efficient estimation of word representations in vector space, Proceedings of Workshop at the International Conference on Learning Representations, 2013. ,
Subword language modeling with neural networks, 2012. ,
Composition in distributional models of semantics, Cognitive science, vol.34, issue.8, pp.1388-1429, 2010. ,
DOI : 10.1111/j.1551-6709.2010.01106.x
URL : https://www.era.lib.ed.ac.uk/bitstream/1842/4927/1/Mitchell2011.pdf
Language independent authorship attribution using character level language models, Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics, vol.1, pp.267-274, 2003. ,
An empirical study on compositionality in compound nouns, Proceedings of 5th International Joint Conference on Natural Language Processing, pp.210-218, 2011. ,
Using distributional similarity of multi-way translations to predict multiword expression compositionality, Proceedings of the 14th Conference of the EACL (EACL, pp.472-481, 2014. ,
A word embedding approach to predicting the compositionality of multiword expressions, Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Denver, pp.977-983, 2015. ,
Idiom token classification using sentential distributed semantics, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol.1, pp.194-204, 2016. ,
Learning character-level representations for part-of-speech tagging, Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp.1818-1826, 2014. ,
Discriminative lexical semantic segmentation with gaps: Running the mwe gamut, Transactions of the Association of Computational Linguistics, vol.2, pp.193-206, 2014. ,
Is knowledge-free induction of multiword unit dictionary headwords a solved problem, Proceedings of the 6th Conference on Empirical Methods in Natural Language Processing, pp.100-108, 2001. ,
Exploring vector space models to predict the compositionality of German noun-noun compounds, Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, vol.1, pp.255-265, 2013. ,
Learning to capitalize with characterlevel recurrent neural networks: An empirical study, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp.2090-2095, 2016. ,
Assoziationen zu unter-, basis-und oberbegriffen. eine explorative studie, Proceedings of the 9th Norddeutsches Linguistisches Kolloquium, pp.51-74, 2009. ,
LinES: An English-Swedish parallel treebank, Proc. of NODALIDA, pp.270-273, 2007. ,
Multiword expressions, Handbook of Natural Language Processing, pp.267-292, 2010. ,
English Web Treebank, 2012. ,
Multiword expression processing: a survey, Computational Linguistics, vol.43, issue.4, pp.837-892, 2017. ,
Modeling the statistical idiosyncrasy of multiword expressions, Proc. of the 11th Workshop on Multiword Expressions, pp.34-38, 2015. ,
Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions, Proc. of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-02152557
Multiword Expressions Acquisition: A Generic and Open Framework. Theory and Applications of Natural Language Processing, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01199863
Multiword expressions: a pain in the neck for NLP, Computational Linguistics and Intelligent Text Processing, vol.2276, pp.189-206, 2002. ,
The PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions, Proc. of the 13th Workshop on Multiword Expressions (MWE 2017), pp.31-47, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01865575
A corpus and model integrating multiword expressions and supersenses, Proc. of NAACL-HLT, pp.1537-1547, 2015. ,
Comprehensive annotation of multiword expressions in a social web corpus, Proc. of LREC, pp.455-461, 2014. ,
SemEval-2016 Task 10: Detecting Minimal Semantic Units and their Meanings (DiMSUM), Proc. of SemEval, pp.546-559, 2016. ,
A gold standard dependency corpus for English, Proc. of LREC, pp.2897-2904, 2014. ,
Wikidata: A free collaborative knowledgebase, Communications of the ACM, vol.57, issue.10, pp.78-85, 2014. ,
, Ça? gr? Çöltekin, 2008.
Universal Stanford dependencies: A cross-linguistic typology, Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pp.4585-4592, 2014. ,
Multi-word annotation in Syntactic treebanks: Proposition for Universal Dependency, Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories, pp.181-189, 2017. ,
Building a Large Annotated Corpus of English: The Penn Treebank, Computational Linguistics, vol.19, issue.2, pp.313-330, 1993. ,
Universal Dependencies v1: A Multilingual Treebank Collection, Proceedings of the Tenth International Conference on Language Resources and Evaluation, 2016. ,
Multiword expressions: A pain in the neck for NLP, Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing, CICLing, pp.1-15, 2002. ,
Comprehensive annotation of multiword expressions in a social web corpus, Proceedings of the Ninth International Conference on Language Resources and Evaluation, pp.455-461, 2014. ,
Annotation d'expressions polylexicales verbales en français, Proceedings of Traitement Automatique des Langues Naturelles (TALN), pp.1-9, 2017. ,
Multiword expression processing: A survey, Computational Linguistics, vol.43, issue.4, pp.837-892, 2017. ,
Metaphor in idiom comprehension, Journal of Memory and Language, vol.37, pp.141-154, 1997. ,
What do idioms really mean?, Journal of Memory and Language, vol.31, issue.4, pp.485-506, 1992. ,
Crowdsourcing complex language resources: Playing to annotate dependency syntax, Proceedings of the 26th International Conference on Computational Linguistics (COLING): Technical Papers, pp.3041-3052, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01378980
Games on multiword expressions for community building, INFOtheca: Journal of Information and Library Science, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01635502
Making people play for lexical acquisition, Proceedings of the 7th Symposium on Natural Language Processing, 2007. ,
Experiment-driven development of a gwap for marking segments in text, Extended Abstracts Publication of the Annual Symposium on ComputerHuman Interaction in Play, pp.397-404, 2017. ,
Universal dependencies v1: A multilingual treebank collection, Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), 2016. ,
Phrase detectives: Utilizing collective intelligence for internet-scale language resource creation, ACM Trans. Interact. Intell. Syst, vol.3, issue.1, p.44, 2013. ,
How naked is the naked truth? a multilingual lexicon of nominal compound compositionality, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol.2, pp.156-161, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01459911
Automation of question generation from sentences, Proceedings of QG2010: The Third Workshop on Question Generation, pp.58-67, 2010. ,
A corpus for modeling morpho-syntactic agreement in Arabic: gender, number and rationality, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers, vol.2, pp.357-362, 2011. ,
Statistical parsing with a context-free grammar and word statistics, p.18, 1997. ,
, LDC Arabic treebanks and associated corpora: Data divisions manual, 2013.
Question parsing for QA in Spanish, Proceedings of the Second Student Research Workshop associated with RANLP 2011, pp.73-78, 2011. ,
Better Arabic parsing: Baselines, evaluations, and analysis, Proceedings of the 23rd International Conference on Computational Linguistics, pp.394-402, 2010. ,
CATiB: The Columbia Arabic treebank, Proceedings of the ACLIJCNLP 2009 conference short papers, pp.221-224, 2009. ,
On Arabic Transliteration, Arabic Computational Morphology: Knowledge-based and Empirical Methods, 2007. ,
Introduction to Arabic natural language processing, Synthesis Lectures on Human Language Technologies, vol.3, issue.1, pp.1-187, 2010. ,
Analysing the effect of out-of-domain data on SMT systems, Proceedings of the Seventh Workshop on Statistical Machine Translation, pp.422-432, 2012. ,
Exploring difficulties in parsing imperatives and questions, Proceedings of 5th International Joint Conference on Natural Language Processing, pp.749-757, 2011. ,
Good question! statistical ranking for question generation, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp.609-617, 2010. ,
Parsing and question classification for question answering, Proceedings of the workshop on Open-domain question answering, vol.12, pp.1-6, 2001. ,
Questionbank: Creating a corpus of parse-annotated questions, Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pp.497-504, 2006. ,
Fast exact inference with a factored model for natural language parsing, Advances in neural information processing systems, pp.3-10, 2003. ,
Dependency parsing, Synthesis Lectures on Human Language Technologies, vol.1, issue.1, pp.1-127, 2009. ,
The Penn Arabic Treebank: Building a large-scale annotated Arabic corpus, NEMLAR conference on Arabic language resources and tools, vol.27, pp.466-467, 2004. ,
Dialogue patterns of an Arabic robot receptionist, Human-Robot Interaction (HRI), 2010 5th ACM/IEEE International Conference on, pp.167-168, 2010. ,
Dependency parsing of modern standard Arabic with lexical and inflectional features, Computational Linguistics, vol.39, issue.1, pp.161-194, 2013. ,
DOI : 10.1162/coli_a_00138
Uptraining for accurate deterministic question parsing, Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP '10, pp.705-713, 2010. ,
Hard time parsing questions: Building a questionbank for French, Tenth International Conference on Language Resources and Evaluation (LREC, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01457184
The domain dependence of parsing. ANLC '97, pp.96-102, 1997. ,
, Generating factoid questions with recurrent neural networks: The 30m factoid questionanswer corpus, 2016.
The other Arabic treebank: Prague dependencies and functions. Arabic computational linguistics: Current implementations, p.104, 2006. ,
Bootstrapping statistical parsers from small datasets, Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics, vol.1, pp.331-338, 2003. ,
DOI : 10.3115/1067807.1067851
URL : http://dl.acm.org/ft_gateway.cfm?id=1067851&type=pdf
Cross-domain semantic parsing via paraphrasing, 2017. ,
DOI : 10.18653/v1/d17-1127
URL : https://doi.org/10.18653/v1/d17-1127
Universal dependencies for Arabic, Proceedings of the Third Arabic Natural Language Processing Workshop, pp.166-176, 2017. ,
DOI : 10.18653/v1/w17-1320
URL : https://doi.org/10.18653/v1/w17-1320
An Arabic dependency treebank in the travel domain, Proceedings of the 3rd Workshop on Open-Source Arabic Corpora and Processing Tools, 2018. ,
Joint evaluation of morphological segmentation and syntactic parsing, Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, vol.2, pp.6-10, 2012. ,
What's in a domain? analyzing genre and topic differences in statistical machine translation, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol.2, pp.560-566, 2015. ,
ACM. 1. Two universal categories, that is, valid for all languages participating in the task: (a) LIGHT VERB CONSTRUCTIONS (LVC), divided into two subcategories: i. LVCs in which the verb is semantically totally bleached (LVC.full), DE eine Rede halten 'hold a speech'?'give a speech', ii. LVCs in which the verb adds a causative meaning to the noun (LVC.cause), 3 e.g. PL narazi´cnarazi´c na straty 'expose to losses' (b) VERBAL IDIOMS (VID), 4 grouping all VMWEs not belonging to other categories, and most often having a relatively high degree of semantic non-compositionality, Proceedings of the International Conference on Big Data and Advanced Wireless Technologies, p.31, 2016. ,
, REFL) either always cooccurs with a given verb, or markedly changes its meaning or subcategorisation frame, e.g. PT se formar 'REFL form'?'graduate' (b) VERB-PARTICLE CONSTRUCTIONS (VPC)-pervasive in Germanic languages and Hungarian, rare in Romance and absent in Slavic languages-with two subcategories: i. fully non-compositional VPCs (VPC.full), 6 in which the particle totally changes the meaning of the verb, e.g. HU berúg 'in-kick'?'get drunk' ii. semi non-compositional VPCs (VPC.semi), 7 in which the particle adds a partly predictable but non-spatial meaning to the verb, e.g. EN wake up (c) MULTI-VERB CONSTRUCTIONS (MVC) 8-close to semantically non-compositional serial verbs in Asian languages like Chinese, Hindi, Indonesian and Japanese, Three quasi-universal categories, valid for some language groups or languages, but not all: (a) INHERENTLY REFLEXIVE VERBS (IRV) 5-pervasive in Romance and Slavic languages, and present in Hungarian and German-in which the reflexive clitic
, One language-specific category, introduced for Italian: References Hazem Al Saied, Matthieu Constant, and Marie Candito. 2017. The ATILF-LLF system for Parseme shared task: a transition-based verbal multiword expression tagger, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.127-132
Automatic conversion of the Basque dependency treebank to Universal Dependencies, Proceedings of the Fourteenth International Workshop on Treebanks an Linguistic Theories (TLT14), pp.233-241, 2015. ,
FidaPLUS corpus of Slovenian: the new generation of the Slovenian reference corpus: its design and tools, Proceedings of the Corpus Linguistics Conference, CL2007, 2007. ,
Multiword expressions, Handbook of Natural Language Processing, pp.978-1420085921, 2010. ,
, Sri Ramagurumurthy Vishnu, and Fei Xia, 2015.
, Findings of the 2016 Conference on Machine Translation (WMT16). In Proceedings of the First Conference on Machine Translation (WMT16), vol.2, pp.131-198, 2016.
A data-driven approach to verbal multiword expression detection. PARSEME Shared Task system description paper, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.121-126, 2017. ,
Strategies for contiguous multiword expression analysis and dependency parsing, Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, vol.1, pp.743-753, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01022415
Le corpus sequoia : annotation syntaxique et exploitation pour l'adaptation d'analyseur par pont lexical, Proceedings of TALN 2012, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00698938
A transition-based system for joint lexical and syntactic analysis, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol.1, pp.161-171, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01808689
Multiword expression processing: A survey, Computational Linguistics, vol.43, issue.4, pp.837-892, 2017. ,
The Szeged TreeBank, Proceedings of the 8th International Conference on Text, Speech and Dialogue, TSD 2005, pp.123-132, 2005. ,
Joint parsing and named entity recognition, HLTNAACL, pp.326-334, 2009. ,
Multiword expression identification with tree substitution grammars: A parsing tour de force with French, Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp.725-735, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-01111383
Parsing models for identifying multiword expressions, Computational Linguistics, vol.39, issue.1, pp.195-227, 2013. ,
Neural networks for multi-word expression detection, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.60-65, 2017. ,
Rositsa Dekova, Tsvetana Dimitrova, and Ekaterina Tarpomanova. 2012. The Bulgarian National Corpus: Theory and practice in corpus design, Journal of Language Modelling, vol.0, issue.1, pp.65-110 ,
Training corpus ssj500k 2.0. Slovenian language resource repository CLARIN, 2017. ,
Syntactic parsing and compound recognition via dual decomposition: Application to French, Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp.1875-1885, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01074298
The PAISÀ Corpus of Italian Web Texts, Proceedings of the 9th Web as Corpus Workshop (WaC-9, pp.36-43, 2014. ,
Detection of verbal multi-word expressions via conditional random fields with syntactic dependency features and semantic re-ranking, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.114-120, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01520762
Universal dependency annotation for multilingual parsing, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, vol.2, pp.92-97, 2013. ,
Joint dependency parsing and multiword expression tokenization, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol.1, pp.1116-1126, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01199865
Parsing and MWE detection: Fips at the PARSEME shared task, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.54-59, 2017. ,
Universal Dependencies v1: a multilingual treebank collection, Proceedings of the Tenth International Conference on Language Resources and Evaluation, pp.1659-1666, 2016. ,
, Coreference in Polish: Annotation, Resolution and Evaluation, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01174653
National Corpus of Polish, Proceedings of the 5th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, pp.259-263, 2011. ,
Persian in MULTEXT-East framework, Advances in Natural Language Processing, pp.541-551, 2006. ,
A survey of multiword expressions in treebanks, Proceedings of the 14th International Workshop on Treebanks & Linguistic Theories conference, 0233. ,
Multiword Expressions: A Pain in the Neck for NLP, Proceedings of the 3rd International Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2002), pp.1-15, 2002. ,
PARSEME-PARSing and Multiword Expressions within a European multilingual network, 7th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01223349
The PARSEME shared task on automatic identification of verbal multiword expressions, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.31-47, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01865575
Ivelina Stoyanova, and Veronika Vincze. forthcoming. PARSEME multilingual corpus of verbal multiword expressions, Multiword expressions at length and in depth, 2017. ,
SemEval-2016 Task 10: Detecting Minimal Semantic Units and their Meanings (DiMSUM), Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pp.546-559, 2016. ,
USzeged: Identifying verbal multiword expressions with POS tagging and parsing techniques, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.48-53, 2017. ,
Iarg-AnCora: Spanish corpus annotated with implicit arguments, Language Resources and Evaluation, vol.50, issue.3, pp.549-584, 2016. ,
Dependency parsing for identifying Hungarian light verb constructions, Proceedings of the Sixth International Joint Conference on Natural Language Processing, pp.207-215, 2013. ,
Promoting multiword expressions in A* TAG parsing, COLING 2016, 26th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, pp.429-439, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01378903
Sentence analysis and collocation identification, Proceedings of the Workshop on Multiword Expressions: from Theory to Applications (MWE 2010), pp.27-35, 2010. ,
The relevance of collocations for parsing, Proceedings of the 10th Workshop on Multiword Expressions (MWE), pp.26-32, 2014. ,
, Appendix A: Composition of the corpus anotation teams
, Polona Gantar (LL), Simon Krek (LL), ?pela Arhar Holdt, Jaka?ibejJaka?Jaka?ibej, Teja Kav?i?, Taja Kuzman. Germanic languages
, Other languages: (AR) Abdelati Hawwari (LL)
Stella Papadelli; (FA) Behrang QasemiZadeh (LL), Proceedings of the ECML Workshop on Mining and Learning in Graphs, 2006. ,
, XML Tree Transformations with Probabilistic Models. Theses, 2007.
URL : https://hal.archives-ouvertes.fr/tel-00342649
Conditional random fields: Probabilistic models for segmenting and labeling sequence data, Proceedings of the Eighteenth International Conference on Machine Learning, ICML '01, pp.282-289, 2001. ,
Practical very large scale CRFs, Proceedings the 48th Annual Meeting of the Association for Computational Linguistics (ACL), pp.504-513, 2010. ,
Detection of Verbal Multi-Word Expressions via Conditional Random Fields with Syntactic Dependency Features and Semantic Re-Ranking, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.114-120, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01520762
Multilingual Word Segmentation: Training Many Language-Specific Tokenizers Smoothly Thanks to the Universal Dependencies Corpus, Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01822151
Semantic Re-Ranking of CRF Label Sequences for Verbal Multiword Expression Identification, 2018. ,
Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions, Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01865575
Behrang QasemiZadeh, Marie Candito, Fabienne Cap, Voula Giouli, Ivelina Stoyanova, and Antoine Doucet. 2017. The PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions, Proceedings of The 13th Workshop on Multiword Expressions, pp.31-47 ,
An Introduction to Conditional Random Fields, Found. Trends Mach. Learn, vol.4, issue.4, pp.267-373, 2012. ,
, Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Software available from tensorflow.org
Multiword expressions. Handbook of natural language processing, vol.2, pp.267-292, 2010. ,
, , 2015.
Multiword expression processing: a survey, Computational Linguistics, vol.43, issue.4, pp.837-892, 2017. ,
Learning Word Vectors for 157 Languages, Proceedings of the International Conference on Language Resources and Evaluation (LREC, 2018. ,
Speech recognition with deep recurrent neural networks, Acoustics, speech and signal processing (icassp), 2013 ieee international conference on, pp.6645-6649, 2013. ,
, Bidirectional LSTM-CRF models for sequence tagging, 2015.
Neural Networks for Multi-Word Expression Detection, p.60, 2017. ,
Conditional random fields: Probabilistic models for segmenting and labeling sequence data, 2001. ,
Phrase representations for multiword expressions, Proceedings of the 12th Workshop on Multiword Expressions, 2016. ,
DOI : 10.18653/v1/w16-1810
URL : https://doi.org/10.18653/v1/w16-1810
Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions, Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01865575
Reporting score distributions makes a difference: Performance study of lstm-networks for sequence tagging, 2017. ,
Discriminative lexical semantic segmentation with gaps: running the MWE gamut, Transactions of the Association for Computational Linguistics, vol.2, pp.193-206, 2014. ,
, How externally computed word embeddings influence the performance of this methodology on MWE detection
, Will this graph-based decoding strategy have a positive impact on standard or domain-specific NER
, What is the source for lower f-scores on languages such as Hungarian, Deutsch and Hindi. that, at first glance, have enough training data to support our approach
, How will this method work for other NLP tasks which involve sparse and long-range dependencies between words, one good example being co-reference resolution
, Whether or not parsing accuracy is sufficient enough to support MWE identification is a different question. Also, given that our system is inspired from parsing, our intuition is that parsing will not enhance results. On a related note, NLP-Cube has end-2-end raw text processing to UD format processing capabilities. This means that it can be used for MWE detection without requiring external CUPT files. Anyone interested can check the end-2-end raw text processing capacity of NLP-Cube on this year's shared task on universal dependencies parsing
The atilf-llf system for parseme shared task: a transition-based verbal multiword expression tagger, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.127-132, 2017. ,
, Named entity recognition with bidirectional lstm-cnns, 2015.
Supervised sequence labelling with recurrent neural networks, vol.385, 2012. ,
Offline handwriting recognition with multidimensional recurrent neural networks, Advances in neural information processing systems, pp.545-552, 2009. ,
Simple and accurate dependency parsing using bidirectional lstm feature representations, 2016. ,
, Neural architectures for named entity recognition, 2016.
Universal dependency annotation for multilingual parsing, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, vol.2, pp.92-97, 2013. ,
Edition 1.1 of the parseme shared task on automatic identification of verbal multiword expressions, Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01865575
The parseme shared task on automatic identification of verbal multiword expressions, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.31-47, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01865575
Multilingual named entity recognition using hybrid neural networks, The Sixth Swedish Language Technology Conference (SLTC), 2016. ,
Udpipe: Trainable pipeline for processing conll-u files performing tokenization, morphological analysis, pos tagging and parsing, Language Resources and Evaluation Conference, p.260, 2016. ,
, Proceedings of the Joint Workshop on , Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), pp.261-267
, Xiaoqiang Zheng. 2015. TensorFlow: Large-scale machine learning on heterogeneous systems. Software available from tensorflow.org
The atilf-llf system for parseme shared task: a transition-based verbal multiword expression tagger, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.127-132, 2017. ,
Multiword expressions. Handbook of natural language processing, vol.2, pp.267-292, 2010. ,
Enriching word vectors with subword information, 2016. ,
A data-driven approach to verbal multiword expression detection. parseme shared task system description paper, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.121-126, 2017. ,
Named entity recognition with bidirectional lstm-cnns, Transactions of the Association for Computational Linguistics, vol.4, pp.357-370, 2016. ,
, , 2015.
Long short-term memory, Neural computation, vol.9, issue.8, pp.1735-1780, 1997. ,
Neural networks for multi-word expression detection, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.60-65, 2017. ,
Non-lexical neural architecture for fine-grained pos tagging, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp.232-237, 2015. ,
Detection of verbal multi-word expressions via conditional random fields with syntactic dependency features and semantic re-ranking, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.114-120, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01520762
Parsing and mwe detection: Fips at the parseme shared task, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.54-59, 2017. ,
Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions, Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01865575
Software Framework for Topic Modelling with Large Corpora, Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp.45-50, 2010. ,
Building large corpora from the web using a new efficient tool chain, Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC-2012), pp.12-1497, 2012. ,
Uszeged: Identifying verbal multiword expressions with pos tagging and parsing techniques, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.48-53, 2017. ,
A fast and accurate dependency parser using neural networks, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.740-750, 2014. ,
, , 2015.
A transition-based system for joint lexical and syntactic analysis, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol.1, pp.161-171, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01808689
Liblinear: A library for large linear classification, Journal of machine learning research, vol.9, pp.1871-1874, 2008. ,
Neural networks for multi-word expression detection, Proceedings of the 13th Workshop on Multiword Expressions, pp.60-65, 2017. ,
Dependency Parsing, 2009. ,
Detection of verbal multi-word expressions via conditional random fields with syntactic dependency features and semantic re-ranking, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.114-120, 2017. ,
DOI : 10.18653/v1/w17-1715
URL : https://hal.archives-ouvertes.fr/hal-01520762
Incrementality in deterministic dependency parsing, Proceedings of the ACL Workshop Incremental Parsing: Bringing Engineering and Cognition Together, pp.50-57, 2004. ,
DOI : 10.3115/1613148.1613156
URL : http://dl.acm.org/ft_gateway.cfm?id=1613156&type=pdf
Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp.2539-2544, 2015. ,
DOI : 10.18653/v1/d15-1303
URL : https://doi.org/10.18653/v1/d15-1303
Random positive-only projections: Ppmi-enabled incremental semantic space construction, Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics, pp.189-198, 2016. ,
HHU at SemEval-2017 task 2: Fast hash-based embeddings for semantic word similarity assessment, Proceedings of the 11th International Workshop on Semantic Evaluation, 2017. ,
Sketching word vectors through hashing, 2017. ,
Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions, Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01865575
Cnn features off-the-shelf: An astounding baseline for recognition, Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPRW '14, pp.512-519, 2014. ,
The ATILF-LLF system for parseme shared task: a transition-based verbal multiword expression tagger, Proceedings of the 13th Workshop on Multiword Expressions, pp.127-132, 2017. ,
The parseme shared task on automatic identification of verbal multiword expressions, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.31-47, 2017. ,
DOI : 10.18653/v1/w17-1704
URL : https://hal.archives-ouvertes.fr/hal-01865575
,
,
, unordered pair {(l v , m v )
,
,
,
,
,
,
, We used the above set of templates consistently for all languages and for all VMWE categories, except for Lithuanian-dependency trees were not available for this language. We therefore converted each sentence in the LT dataset to a pseudo-dependency tree in which (i) the first token is the root, (ii) every other token is the child of the preceding token, thus obtaining a model equivalent to a 2-order sequential CRF. We also adapted the default set of templates to Lithuanian by replacing the sibling templates with selected grandparent-related templates
The pre-processing method most often applied, case lifting, consisted in reattaching case dependents to their grandparents so as to make MWEs of certain categories-notably, inherently adpositional verbs-connected. 4 We applied it to BG, case of Slovak, we relied on language-specific POS tags rather than universal tags ,
, 66.96 1/11 6.42 81.85 59.03 68.59 1/11 3.67 PT 4430 553 553 95, vol.54
, Table 1: Detailed results of TRAVERSAL for 19 languages (identified by their ISO 639-1 codes) that tokens with an unspecified dependency head are attached to the artificial root node (with ID=0). The same pre-processing steps were applied to TRAIN, DEV, and (blind) TEST data
, Segmentation Once the labeling of a given dependency tree is determined, we need to determine the boundaries of the detected MWEs. To this end, we considered two heuristics: (i) all adjacent nodes marked as MWEs of the same category are considered as a single MWE occurrence, and (ii) if a group of adjacent nodes is marked as MWEs but it contains two
We applied the first heuristic for all languages except Farsi, where the second heuristic yielded better results, notably due to a relatively high frequency of neighboring MWEs in the FA dataset ,
, For each language, the MWE-based and token-based precision (P), recall (R), and F 1 (F1) scores are reported, as well as the rank (Rank) of our system, and the difference (Delta) between the TRAVERSAL's F 1 score and the score of the other best closed-track system. The datasets with dependencies annotated manually, partially manually, or not at all, are marked with , or ? , respectively. For the other datasets/sentences, dependencies were obtained automatically. Con is the % of connected (via parental or sibling relation) VMWEs in the TRAIN+DEV dataset (no value =? Con=Con p ), and Con p is the same measure after pre-processing. Finally, Iso p is the % of connected and isolated (with no adjacent VMWEs of the same category) VMWEs after pre-processing, for which the baseline segmentation heuristic is sufficient. Language-wise, our system performed particularly well for Slavic and Romance languages, which is likely related to our choice of Polish and French for feature template engineering. FA was the most References Anne Abeillé and Yves Schabes. 1989. Parsing Idioms in Lexicalized TAGs, according to both official evaluation measures: MWE-based F 1 and token-based F 1. Table 1 summarizes the performance of our system across 19 languages of the shared task (all except Arabic), pp.1-9
Prague dependency treebank 2.5-a revisited version of pdt 2.0, Proceedings of the 24th International Conference on Computational Linguistics, pp.231-246, 2012. ,
Strategies for Contiguous Multiword Expression Analysis and Dependency Parsing, Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, vol.1, pp.743-753, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01022415
A Transition-Based System for Joint Lexical and Syntactic Analysis, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol.1, pp.161-171, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01808689
Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing, Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, vol.1, pp.204-212, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00790613
Directed Hypergraphs and Applications, Discrete Appl. Math, vol.42, issue.2-3, pp.177-201, 1993. ,
Parsing Models for Identifying Multiword Expressions, Computational Linguistics, issue.1, p.39, 2013. ,
Tree-Adjoining Grammars, Grzegorz Rozenberg and Arto Salomaa, pp.69-123, 1997. ,
Parsing and Hypergraphs, Seventh International Workshop on Parsing Technologies (IWPT-2001), 2001. ,
Dependency parsing, Synthesis Lectures on Human Language Technologies, vol.1, issue.1, pp.1-127, 2009. ,
Syntactic Parsing and Compound Recognition via Dual Decomposition: Application to French, Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp.1875-1885, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01074298
Detection of Verbal Multi-Word Expressions via Conditional Random Fields with Syntactic Dependency Features and Semantic Re-Ranking, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.114-120, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01520762
Online learning of approximate dependency parsing algorithms, 11th Conference of the European Chapter, 2006. ,
On the complexity of non-projective data-driven dependency parsing, Proceedings of the Tenth International Conference on Parsing Technologies, pp.121-132, 2007. ,
VPCTagger: Detecting Verb-Particle Constructions With SyntaxBased Methods, Proceedings of the 10th Workshop on Multiword Expressions (MWE), pp.17-25, 2014. ,
Joint Dependency Parsing and Multiword Expression Tokenization, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol.1, pp.1116-1126, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01199865
On the momentum term in gradient descent learning algorithms, Neural networks, vol.12, issue.1, pp.145-151, 1999. ,
Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions, Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-02152557
An overview of gradient descent optimization algorithms, 2016. ,
Identification of Ambiguous Multiword Expressions Using Sequence Models and Lexical Resources, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.167-175, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01795903
, An introduction to conditional random fields. Foundations and Trends R in Machine Learning, vol.4, pp.267-373, 2012.
Learning to Detect English and Hungarian Light Verb Constructions, ACM Trans. Speech Lang. Process, vol.10, issue.2, pp.1-6, 2013. ,
Promoting multiword expressions in A* TAG parsing, Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp.429-439, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01378903
Multiword expressions, Handbook of Natural Language Processing, pp.267-292, 2010. ,
Multiword expression processing: A survey, Computational Linguistics, vol.43, issue.4, pp.837-892, 2017. ,
If you've seen some, you've seen them all: Identifying variants of multiword expressions, Proceedings of COLING 2018, the 27th International Conference on Computational Linguistics. The COLING 2018 Organizing Committee, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01866345
of the VMWE, and not its span, thus the need for a special tag 'g' to indicate intermediate tokens. During system development, one of our goals was to evaluate different tagging schemes and choose the best one based on the development corpus performances. Therefore, in addition to the extended BIO scheme, we also tested an adaptation that includes category labels (BIO+cat). 'B' and 'I' tags are thus concatenated with the provided VMWE's category labels (IRV, LVC.full, VID, etc). The idea is that categories present quite heterogeneous characteristics, so it may be a good idea to model/learn them separately in the neural network. This is illustrated in the last row of Figure 1. Finally, We have also evaluated our system using an inside-outside scheme similar to the one used in MUMULs, We use CoNLL-U's LEMMA and UPOS fields as input features (falling back to FORM and XPOS, respectively, if the former are absent). 3 Each token's LEMMA and UPOS are converted into one-hot vectors, which are then transformed into embeddings and concatenated. Input LEMMA and UPOS embeddings are pre-initialized on the shared task training corpora, but fine-tuned during the training phase. These embeddings are then forwarded to a double bidirectional recurrent layer using gated recurrent units (GRU). Finally, each BIO label prediction is based on a softmax layer that takes as input the concatenation of the GRU cell outputs in both directions for each token, 2014. ,
MWU-aware part-of-speech tagging with a CRF model and lexical resources, Proceedings of the ALC Workshop on Multiword Expressions: From Parsing and Generation to the Real World, pp.49-56, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00621585
Multiword expression processing: A survey, Computational Linguistics, vol.43, issue.4, pp.837-892, 2017. ,
Neural networks for multi-word expression detection, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.60-65, 2017. ,
Detection of verbal multi-word expressions via conditional random fields with syntactic dependency features and semantic re-ranking, Proceedings of the 13th Workshop on Multiword Expressions , MWE '17, pp.114-120, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01520762
Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions, Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-02152557
Text chunking using transformation-based learning, 3rd Workshop on Very Large Corpora, pp.82-94, 1995. ,
Impact of MWE resources on multiword recognition, Proceedings of the 12th Workshop on Multiword Expressions , MWE '16, pp.107-111, 2016. ,
The PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp.31-47, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01865575
Discriminative lexical semantic segmentation with gaps: Running the MWE gamut, Transactions of the Association for Computational Linguistics, vol.2, issue.1, pp.193-206, 2014. ,
Identification of ambiguous multiword expressions using sequence models and lexical resources, Proceedings of the 13th Workshop on Multiword Expressions, MWE '17, pp.167-175, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01795903
Identification d'expressions polylexicales avec réseaux de neurones récurrents, Traitement Automatique des Langues, 2018. ,