Comparing Recurring Lexico-Syntactic Trees (RLTs) and Ngram Techniques for Extended Phraseology Extraction: a Corpus-based Study on French Scientific Articles - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2017

Comparing Recurring Lexico-Syntactic Trees (RLTs) and Ngram Techniques for Extended Phraseology Extraction: a Corpus-based Study on French Scientific Articles

Résumé

This paper aims at assessing to what extent a syntax-based method (Recurring Lexico-syntactic Trees (RLT) extraction) allows us to extract large phraseological units such as prefabricated routines, e.g. as previously said or as far as we/I know in scientific writing. In order to evaluate this method, we compare it to the classical ngram extraction technique, on a subset of recurring segments including speech verbs in a French corpus of scientific writing. Results show that the RLT extraction technique is far more accurate for extended MWEs such as routines or collocations but performs more poorly for surface phenomena such as syntactic constructions or fully frozen expressions.

Domaines

Linguistique
Fichier principal
Vignette du fichier
comparing-recurring-lexico.pdf (242.07 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01524862 , version 1 (19-05-2017)

Identifiants

  • HAL Id : hal-01524862 , version 1

Citer

Agnès Tutin, Olivier Kraif. Comparing Recurring Lexico-Syntactic Trees (RLTs) and Ngram Techniques for Extended Phraseology Extraction: a Corpus-based Study on French Scientific Articles. 13th Workshop on Multiword Expressions - EACL, Apr 2017, Valencia, Spain. ⟨hal-01524862⟩

Collections

UGA LIDILEM
91 Consultations
51 Téléchargements

Partager

Gmail Facebook X LinkedIn More