Facing the facts of fake: a distributional semantics and corpus annotation approach

Bert Cappelle 1 Pascal Denis 2 Mikaela Keller 2
2 MAGNET - Machine Learning in Information Networks
CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189, Inria Lille - Nord Europe
Abstract : Fake is often considered the textbook example of a so-called 'privative' adjective, one which, in other words, allows the proposition that '(a) fake x is not (an) x'. This study tests the hypothesis that the contexts of an adjective-noun combination are more different from the contexts of the noun when the adjective is such a 'privative' one than when it is an ordinary (subsective) one. We here use 'embeddings', that is, dense vector representations based on word co-occurrences in a large corpus, which in our study is the entire English Wikipedia as it was in 2013. Comparing the cosine distance between the adjective-noun bigram and single noun embeddings across two sets of adjectives, privative and ordinary ones, we fail to find a noticeable difference. However, we contest that fake is an across-the-board privative adjective, since a fake article, for instance, is most definitely still an article. We extend a recent proposal involving the noun's qualia roles (how an entity is made, what it consists of, what it is used for, etc.) and propose several interpretational types of fake-noun combinations, some but not all of which are privative. These interpretations, which we assign manually to the 100 most frequent fake-noun combinations in the Wikipedia corpus, depend to a large extent on the meaning of the noun, as combinations with similar interpretations tend to involve nouns that are linked in a distributions-based network. When we restrict our focus to the privative uses of fake only, we do detect a slightly enlarged difference between fake + noun bigram and noun distributions compared to the previously obtained average difference between adjective + noun bigram and noun distributions. This result contrasts with negative or even opposite findings reported in the literature.
Document type :
Journal articles
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01959609
Contributor : Bert Cappelle <>
Submitted on : Tuesday, December 18, 2018 - 6:54:00 PM
Last modification on : Friday, March 22, 2019 - 1:34:27 AM
Document(s) archivé(s) le : Wednesday, March 20, 2019 - 9:51:28 AM

File

Facing the facts of fake __ Au...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01959609, version 1

Collections

Citation

Bert Cappelle, Pascal Denis, Mikaela Keller. Facing the facts of fake: a distributional semantics and corpus annotation approach. Yearbook of the German Cognitive Linguistics Association, Walter de Gruyter Berlin/Boston, 2018, 6 (9-42). ⟨hal-01959609⟩

Share

Metrics

Record views

51

Files downloads

79