Skip to Main content Skip to Navigation
New interface
Poster communications

Construction of linguistic resources for the extraction of "complex text segments"

Abstract : The development of computational linguistic resources (electronic dictionaries and grammars) for the automatic extraction, identification, and further fine-grained annotation of "complex text segments" , is the core of our work. We use and extend the notion of multi-word units (MWUs) by allowing a large description of linguistic objects: compound nouns, entity names, verbal forms (compound tense and negate forms, introduction of clauses between the auxiliary and the past participle, etc.) and frozen expressions (i.e. idioms). The identification of complex sequences of text segments is done by using dictionary graphs which combines the power and versatility of the local grammars and the expressivity of the electronic dictionaries.
Document type :
Poster communications
Complete list of metadata
Contributor : Claude Martineau Connect in order to contact the contributor
Submitted on : Saturday, January 28, 2017 - 7:27:12 PM
Last modification on : Thursday, September 29, 2022 - 2:21:15 PM


  • HAL Id : hal-01448712, version 1


Tita Kyriacopoulou, Claude Martineau, Cristian Martinez, Aggeliki Fotopoulou. Construction of linguistic resources for the extraction of "complex text segments". PARSEME 2nd general meeting, Mar 2014, Athènes, Greece. ⟨hal-01448712⟩



Record views