Skip to Main content Skip to Navigation
Conference papers

A semi-automatically generated TAG for Arabic: Dealing with linguistic phenomena

Abstract : Arabic is a challenging language when it comes to grammar production and parsing. It combines complex linguistic phenomena with a rich morphology that make its processing particularly ambiguous. This leaded us to choose the Tree-Adjoining Grammar (TAG) formalism. Indeed, TAG provides sufficient constraints for handling diverse linguistic phenomena and seems to be adequate to represent Arabic syntactic structures. In this paper, we present a semi-automatically generated TAG for modern standard Arabic using a compiler and a metagrammatical description language called XMG (eXtensible MetaGrammar). We focus on the linguistic coverage of our grammar, and show how we used TAG and XMG’s properties to define in an expressive and concise way different linguistic phenomena. To check the coverage of our grammar, we have set up a development environment including a parser and using a test corpus of linguistic phenomena gathering both grammatical and ungrammatical sentences.
Document type :
Conference papers
Complete list of metadata
Contributor : Yannick Parmentier <>
Submitted on : Tuesday, April 10, 2018 - 11:39:18 AM
Last modification on : Friday, December 11, 2020 - 11:14:04 AM


  • HAL Id : hal-01762597, version 1


Chérifa Ben Khelil, Chiraz Zribi, Denys Duchier, Yannick Parmentier. A semi-automatically generated TAG for Arabic: Dealing with linguistic phenomena. 19th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing 2018), Mar 2018, Hanoi, Vietnam. ⟨hal-01762597⟩



Record views