LexSubNC: a Dataset of Lexical Substitution for Nominal Compounds

Abstract : In the context of NLP tasks such as text simplification, lexicons containing information about semantically related words are an important resource for evaluating the quality of the system output. Existing resources containing lexical substitutes have been built with a focus on single words. In this paper, we present a lexical substitution dataset for Portuguese nominal compounds. The compounds have varying degrees of compositionality, conventionality and frequency, and we investigate the impact of these characteristics on the suggestions of lexical substitution made by native speakers. No strong correlations are found for these factors on the number or type of responses provided. However, a significant effect of compositionality is found in the use of one of the component words (head or modifier) as a substitute. The resulting resource, LexSubNC, contains over 1,500 manually validated substitutes for 180 compounds, further classified according to the type of response.
Document type :
Conference papers
Liste complète des métadonnées

Cited literature [28 references]  Display  Hide  Download

Contributor : Carlos Ramisch <>
Submitted on : Friday, May 18, 2018 - 7:32:46 PM
Last modification on : Wednesday, July 25, 2018 - 1:23:05 AM
Document(s) archivé(s) le : Tuesday, September 25, 2018 - 11:49:39 AM


Files produced by the author(s)


  • HAL Id : hal-01795956, version 1



Rodrigo Wilkens, Leonardo Zilio, Silvio Cordeiro, Felipe Paula, Carlos Ramisch, et al.. LexSubNC: a Dataset of Lexical Substitution for Nominal Compounds. Proceedings of the 12th International Conference on Computational Semantics (IWCS 2017) - Short papers, 2017, Montpellier, France. ⟨hal-01795956⟩



Record views


Files downloads