Facing the facts of fake: a distributional semantics and corpus annotation approach

Bert Cappelle 1 Pascal Denis 2 Mikaela Keller 2
2 MAGNET - Machine Learning in Information Networks
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe
Abstract : Fake is often considered the textbook example of a so-called 'privative' adjective, one which, in other words, allows the proposition that '(a) fake x is not (an) x'. This study tests the hypothesis that the contexts of an adjective-noun combination are more different from the contexts of the noun when the adjective is such a 'privative' one than when it is an ordinary (subsective) one. We here use 'embeddings', that is, dense vector representations based on word co-occurrences in a large corpus, which in our study is the entire English Wikipedia as it was in 2013. Comparing the cosine distance between the adjective-noun bigram and single noun embeddings across two sets of adjectives, privative and ordinary ones, we fail to find a noticeable difference. However, we contest that fake is an across-the-board privative adjective, since a fake article, for instance, is most definitely still an article. We extend a recent proposal involving the noun's qualia roles (how an entity is made, what it consists of, what it is used for, etc.) and propose several interpretational types of fake-noun combinations, some but not all of which are privative. These interpretations, which we assign manually to the 100 most frequent fake-noun combinations in the Wikipedia corpus, depend to a large extent on the meaning of the noun, as combinations with similar interpretations tend to involve nouns that are linked in a distributions-based network. When we restrict our focus to the privative uses of fake only, we do detect a slightly enlarged difference between fake + noun bigram and noun distributions compared to the previously obtained average difference between adjective + noun bigram and noun distributions. This result contrasts with negative or even opposite findings reported in the literature.
Type de document :
Article dans une revue
Yearbook of the German Cognitive Linguistics Association, Walter de Gruyter Berlin/Boston, 2018, 6 (9-42)
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01959609
Contributeur : Bert Cappelle <>
Soumis le : mardi 18 décembre 2018 - 18:54:00
Dernière modification le : jeudi 21 février 2019 - 10:52:55

Fichier

Facing the facts of fake __ Au...
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01959609, version 1

Collections

Citation

Bert Cappelle, Pascal Denis, Mikaela Keller. Facing the facts of fake: a distributional semantics and corpus annotation approach. Yearbook of the German Cognitive Linguistics Association, Walter de Gruyter Berlin/Boston, 2018, 6 (9-42). 〈hal-01959609〉

Partager

Métriques

Consultations de la notice

29

Téléchargements de fichiers

45