Extending the Gold Standard for a Lexical Substitution Task: is it worth it?

Abstract : We present a new evaluation scheme for the lexical substitution task. Following (McCarthy and Navigli, 2007) we conducted an annotation task for French that mixes two datasets: in the first one, 300 sentences containing a target word (among 30 different) were submitted to annotators who were asked to provide substitutes. The second one contains the propositions of the systems that participated to the lexical substitution task based on the same data. The idea is first, to assess the capacity of the systems to provide good substitutes that would not have been proposed by the annotators and second, to measure the impact on the task evaluation of a new gold standard that incorporates these additional data. While (McCarthy and Navigli, 2009) have conducted a similar post hoc analysis, re-evaluation of the systems' performances has not been carried out to our knowledge. This experiment shows interesting differences between the two resulting datasets and gives insight on how automatically retrieved substitutes can provide complementary data to a lexical production task, without however a major impact on the evaluation of the systems.
Complete list of metadatas

Cited literature [12 references]  Display  Hide  Download

Contributor : Ludovic Tanguy <>
Submitted on : Wednesday, May 16, 2018 - 2:44:38 PM
Last modification on : Friday, May 17, 2019 - 1:49:38 AM
Long-term archiving on : Tuesday, September 25, 2018 - 7:28:49 AM


Files produced by the author(s)


  • HAL Id : hal-01793360, version 1



Ludovic Tanguy, Cécile Fabre, Laura Rivière. Extending the Gold Standard for a Lexical Substitution Task: is it worth it?. LREC, May 2018, Miyazaki, Japan. ⟨hal-01793360⟩



Record views


Files downloads