Joint Dependency Parsing and Multiword Expression Tokenisation

Abstract : Complex conjunctions and determiners are often considered as pretokenized units in parsing. This is not always realistic, since they can be ambiguous. We propose a model for joint dependency parsing and multiword expressions identification, in which complex function words are represented as individual tokens linked with morphological dependencies. Our graph-based parser includes standard second-order features and verbal subcategoriza-tion features derived from a syntactic lexicon .We train it on a modified version of the French Treebank enriched with morphological dependencies. It recognizes 81.79% of ADV+que conjunctions with 91.57% precision, and 82.74% of de+DET determiners with 86.70% precision.
Document type :
Conference papers
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01464872
Contributor : Alexis Nasr <>
Submitted on : Wednesday, February 15, 2017 - 5:24:25 PM
Last modification on : Monday, March 4, 2019 - 2:04:14 PM
Long-term archiving on : Tuesday, May 16, 2017 - 12:10:17 PM

File

P15-1108.pdf
Publisher files allowed on an open archive

Identifiers

  • HAL Id : hal-01464872, version 1

Collections

Citation

Alexis Nasr, Carlos Ramisch, José Deulofeu, André Valli. Joint Dependency Parsing and Multiword Expression Tokenisation. Annual Meeting of the Association for Computational Linguistics, Jul 2015, Beijing, China. pp.1116 - 1126. ⟨hal-01464872⟩

Share

Metrics

Record views

68

Files downloads

47