Lexical Knowledge Acquisition: Towards a Continuous and Flexible Representation of the Lexicon - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2016

Lexical Knowledge Acquisition: Towards a Continuous and Flexible Representation of the Lexicon

Résumé

The automatic acquisition of lexical knowledge is an important issue for natural language processing. Lots of work has been done since two decades in this domain, but we think there is still room for improvement as we need to develop both efficient and cognitively plausible models. In this paper, we focus on verbs since verbs is the pivot of the sentence and we have a closer look at two fundamental aspects of the description of the verb: the notion of lexical item and the distinction between arguments and adjuncts. Following up on studies in natural language processing and linguistics, we embrace the double hypothesis i) of a continuum between ambiguity and vagueness, and ii) of a continuum between arguments and adjuncts. We provide a complete approach to lexical knowledge acquisition of verbal constructions from an untagged news corpus. The approach is evaluated through the analysis of a sample of the 7,000 Japanese verbs automatically described by the system. This paper aims at showing that lexical descriptions based on multifactorial and continuous models can be used both by linguists and lexicographers, and provide a cognitively interesting model for lexical semantics. Our results are available online at: http://marchal.er-tim.fr/ikf/. 1 Background and Motivations " You shall know a word by the company it keeps " [Firth, 1957]. This too well known citation from J.R. Firth motivates any lexicographic work today: it is widely accepted that word description cannot be achieved without the analysis of a large number of contexts extracted from real corpora. But this is not enough. The recent success of deep learning approaches have shown that static representations of the lexicon are no longer appropriate. Continuous models offer a better representation of word meaning, because they encode intuitively valid and cognitively plausible principles: semantic similarity is relative, context-sensitive and depends on multiple-cue integration. However, these models have not been used for representing meaning in dictionaries written for humans. One may think that these models are complex and convenient for machines, but that they are too abstract for humans. In this paper we defend the opposite idea. If continuous models offer a better representation of the lexicon, we must conceive new lexical databases that are usable by humans and have the same basis as these continuous models. There are arguments to support this view. For example, it has been demonstrated that semantic categories have fuzzy boundaries and thus the number of word meanings per lexical item is to a large extent arbitrary [Tuggy, 1993]. Although this still fuels lots of discussions among linguists and lexicographers, we claim that a description can be more or less fine-grained while keeping the same accuracy and validity. Moreover, it has been demonstrated that lexical entries in traditional dictionaries overlap and different word meanings can be associated with a same example [Erk and McCarthy, 2009], showing that meaning cannot be sliced in separate and exclusive word senses. The same problem also arises when it comes to differentiate arguments and adjuncts. As said in [Manning, 2003]: 'There are some very clear arguments (normally, subjects and objects), and some very clear adjuncts (of
Fichier principal
Vignette du fichier
ijcai_cognitum.pdf (387.6 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01420714 , version 1 (20-12-2016)

Identifiants

  • HAL Id : hal-01420714 , version 1

Citer

Pierre Marchal, Thierry Poibeau. Lexical Knowledge Acquisition: Towards a Continuous and Flexible Representation of the Lexicon. Workshop on Cognitive Knowledge Acquisition and Applications, Jul 2016, New York, United States. ⟨hal-01420714⟩
518 Consultations
152 Téléchargements

Partager

Gmail Facebook X LinkedIn More