Construction of linguistic resources for the extraction of  "complex text segments"

Tita Kyriacopoulou; Claude Martineau; Cristian Martinez; Aggeliki Fotopoulou

Poster De Conférence Année : 2014

Construction of linguistic resources for the extraction of "complex text segments"

(1) , (1) , (1) , (2)

1
2

Tita Kyriacopoulou

Fonction : Auteur
PersonId : 20648
IdHAL : tita-kyriacopoulou

Laboratoire d'Informatique Gaspard-Monge

Claude Martineau

Fonction : Auteur
PersonId : 1497
IdHAL : claude-martineau

Laboratoire d'Informatique Gaspard-Monge

Cristian Martinez

Fonction : Auteur

Laboratoire d'Informatique Gaspard-Monge

Aggeliki Fotopoulou

Fonction : Auteur

Institute for Language and Speech Processing

Résumé

The development of computational linguistic resources (electronic dictionaries and grammars) for the automatic extraction, identification, and further fine-grained annotation of "complex text segments" , is the core of our work. We use and extend the notion of multi-word units (MWUs) by allowing a large description of linguistic objects: compound nouns, entity names, verbal forms (compound tense and negate forms, introduction of clauses between the auxiliary and the past participle, etc.) and frozen expressions (i.e. idioms). The identification of complex sequences of text segments is done by using dictionary graphs which combines the power and versatility of the local grammars and the expressivity of the electronic dictionaries.

Domaines

Linguistique Informatique [cs]

Claude Martineau : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01448712

Soumis le : samedi 28 janvier 2017-19:27:12

Dernière modification le : vendredi 5 avril 2024-03:25:20

Dates et versions

hal-01448712 , version 1 (28-01-2017)

Identifiants

HAL Id : hal-01448712 , version 1

Citer

Tita Kyriacopoulou, Claude Martineau, Cristian Martinez, Aggeliki Fotopoulou. Construction of linguistic resources for the extraction of "complex text segments". PARSEME 2nd general meeting, Mar 2014, Athènes, Greece. ⟨hal-01448712⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENPC CNRS LIGM_LINGU PARISTECH LIGM LIGM_MOA ESIEE-PARIS UNIV-EIFFEL LIGM_ADA JSE2024

171 Consultations

0 Téléchargements

Construction of linguistic resources for the extraction of "complex text segments"

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager