LexSubNC: a Dataset of Lexical Substitution for Nominal Compounds

Rodrigo Wilkens; Leonardo Zilio; Silvio Cordeiro; Felipe S F Paula; Carlos Ramisch; Marco Idiart; Aline Villavicencio

Communication Dans Un Congrès Année : 2017

LexSubNC: a Dataset of Lexical Substitution for Nominal Compounds

(1) , (2) , (3) , (2) , (3) , (2) , (2)

1
2
3

Rodrigo Wilkens

Fonction : Auteur

Instituto de Informática [Porto Alegre]

Leonardo Zilio

Fonction : Auteur

Universidade Federal do Rio Grande do Sul [Porto Alegre]

Silvio Cordeiro

Fonction : Auteur

Traitement Automatique du Langage Ecrit et Parlé

Felipe S F Paula

Fonction : Auteur

Universidade Federal do Rio Grande do Sul [Porto Alegre]

Carlos Ramisch

Fonction : Auteur
PersonId : 5103
IdHAL : carlos-ramisch
ORCID : 0000-0001-7466-9039
IdRef : 170720802

Traitement Automatique du Langage Ecrit et Parlé

Marco Idiart

Fonction : Auteur

Universidade Federal do Rio Grande do Sul [Porto Alegre]

Aline Villavicencio

Fonction : Auteur

Universidade Federal do Rio Grande do Sul [Porto Alegre]

Résumé

In the context of NLP tasks such as text simplification, lexicons containing information about semantically related words are an important resource for evaluating the quality of the system output. Existing resources containing lexical substitutes have been built with a focus on single words. In this paper, we present a lexical substitution dataset for Portuguese nominal compounds. The compounds have varying degrees of compositionality, conventionality and frequency, and we investigate the impact of these characteristics on the suggestions of lexical substitution made by native speakers. No strong correlations are found for these factors on the number or type of responses provided. However, a significant effect of compositionality is found in the use of one of the component words (head or modifier) as a substitute. The resulting resource, LexSubNC, contains over 1,500 manually validated substitutes for 180 compounds, further classified according to the type of response.

Domaines

Informatique et langage [cs.CL]

Fichier principal

W17-6941.pdf (336.79 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Carlos Ramisch : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01795956

Soumis le : vendredi 18 mai 2018-19:32:46

Dernière modification le : vendredi 22 mars 2024-18:24:04

Archivage à long terme le : mardi 25 septembre 2018-11:49:39

Dates et versions

hal-01795956 , version 1 (18-05-2018)

Identifiants

HAL Id : hal-01795956 , version 1

Citer

Rodrigo Wilkens, Leonardo Zilio, Silvio Cordeiro, Felipe S F Paula, Carlos Ramisch, et al.. LexSubNC: a Dataset of Lexical Substitution for Nominal Compounds. Proceedings of the 12th International Conference on Computational Semantics (IWCS 2017) - Short papers, 2017, Montpellier, France. ⟨hal-01795956⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-TLN CNRS UNIV-AMU LIS-LAB

127 Consultations

39 Téléchargements

LexSubNC: a Dataset of Lexical Substitution for Nominal Compounds

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager