Lexical Knowledge Acquisition: Towards a Continuous and Flexible Representation of the Lexicon

Pierre Marchal; Thierry Poibeau

Communication Dans Un Congrès Année : 2016

Lexical Knowledge Acquisition: Towards a Continuous and Flexible Representation of the Lexicon

(1) , (2)

1
2

Pierre Marchal

Fonction : Auteur
PersonId : 991240

Équipe de Recherche en Textes, Informatique, Multilinguisme

Thierry Poibeau

Fonction : Auteur
PersonId : 472
IdHAL : thierry-poibeau
ORCID : 0000-0003-3669-4051
IdRef : 069992258

Lattice - Langues, Textes, Traitements informatiques, Cognition - UMR 8094

Résumé

The automatic acquisition of lexical knowledge is an important issue for natural language processing. Lots of work has been done since two decades in this domain, but we think there is still room for improvement as we need to develop both efficient and cognitively plausible models. In this paper, we focus on verbs since verbs is the pivot of the sentence and we have a closer look at two fundamental aspects of the description of the verb: the notion of lexical item and the distinction between arguments and adjuncts. Following up on studies in natural language processing and linguistics, we embrace the double hypothesis i) of a continuum between ambiguity and vagueness, and ii) of a continuum between arguments and adjuncts. We provide a complete approach to lexical knowledge acquisition of verbal constructions from an untagged news corpus. The approach is evaluated through the analysis of a sample of the 7,000 Japanese verbs automatically described by the system. This paper aims at showing that lexical descriptions based on multifactorial and continuous models can be used both by linguists and lexicographers, and provide a cognitively interesting model for lexical semantics. Our results are available online at: http://marchal.er-tim.fr/ikf/. 1 Background and Motivations " You shall know a word by the company it keeps " [Firth, 1957]. This too well known citation from J.R. Firth motivates any lexicographic work today: it is widely accepted that word description cannot be achieved without the analysis of a large number of contexts extracted from real corpora. But this is not enough. The recent success of deep learning approaches have shown that static representations of the lexicon are no longer appropriate. Continuous models offer a better representation of word meaning, because they encode intuitively valid and cognitively plausible principles: semantic similarity is relative, context-sensitive and depends on multiple-cue integration. However, these models have not been used for representing meaning in dictionaries written for humans. One may think that these models are complex and convenient for machines, but that they are too abstract for humans. In this paper we defend the opposite idea. If continuous models offer a better representation of the lexicon, we must conceive new lexical databases that are usable by humans and have the same basis as these continuous models. There are arguments to support this view. For example, it has been demonstrated that semantic categories have fuzzy boundaries and thus the number of word meanings per lexical item is to a large extent arbitrary [Tuggy, 1993]. Although this still fuels lots of discussions among linguists and lexicographers, we claim that a description can be more or less fine-grained while keeping the same accuracy and validity. Moreover, it has been demonstrated that lexical entries in traditional dictionaries overlap and different word meanings can be associated with a same example [Erk and McCarthy, 2009], showing that meaning cannot be sliced in separate and exclusive word senses. The same problem also arises when it comes to differentiate arguments and adjuncts. As said in [Manning, 2003]: 'There are some very clear arguments (normally, subjects and objects), and some very clear adjuncts (of

Mots clés

Lexical acquisition Japanese

Domaines

Traitement du texte et du document Intelligence artificielle [cs.AI] Informatique Linguistique Linguistique Méthodes et statistiques Sciences de l'information et de la communication

Fichier principal

ijcai_cognitum.pdf (387.6 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Thierry Poibeau : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01420714

Soumis le : mardi 20 décembre 2016-22:33:34

Dernière modification le : vendredi 24 mars 2023-14:53:03

Archivage à long terme le : lundi 20 mars 2017-22:23:45

Dates et versions

hal-01420714 , version 1 (20-12-2016)

Identifiants

HAL Id : hal-01420714 , version 1

Citer

Pierre Marchal, Thierry Poibeau. Lexical Knowledge Acquisition: Towards a Continuous and Flexible Representation of the Lexicon. Workshop on Cognitive Knowledge Acquisition and Applications, Jul 2016, New York, United States. ⟨hal-01420714⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS CNRS UNIV-PARIS3 LATTICE INALCO ERTIM CAMPUS-AAR AAI PSL USPC

518 Consultations

152 Téléchargements

Lexical Knowledge Acquisition: Towards a Continuous and Flexible Representation of the Lexicon

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager