Detection of computer generated papers in scientific literature

Abstract : Meaningless computer generated scientific texts can be used in several ways. For example, they have allowed Ike Antkare to become one of the most highly cited scientists of the modern world. Such fake publications are also appearing in real scientific conferences and, as a result, in the bibliographic services (Scopus, ISI-Web of Knowledge, Google Scholar,...). Recently, more than 120 papers have been withdrawn from subscription databases of two high-profile publishers, IEEE and Springer, because they were computer generated thanks to the SCIgen software. This software, based on a Probabilistic Context Free Grammar (PCFG), was designed to randomly generate computer science research papers. Together with PCFG, Markov Chains (MC) are the mains ways to generated Meaningless texts. This paper presents the mains characteristic of texts generated by PCFG and MC. For the time being, PCFG generators are quite easy to spot by an automatic way, using intertextual distance combined with automatic clustering, because these generators are behaving like authors with specifics features such as a very low vocabulary richness and unusual sentence structures. This shows that quantitative tools are effective to characterize originality (or banality) of authors' language.
Type de document :
Chapitre d'ouvrage
Mirko Degli Esposti; Eduardo G. Altmann; François Pachet. Creativity and Universality in Language, 2016
Liste complète des métadonnées

Littérature citée [37 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01134598
Contributeur : Cyril Labbé <>
Soumis le : mardi 24 mars 2015 - 08:44:30
Dernière modification le : lundi 11 février 2019 - 16:36:02
Document(s) archivé(s) le : lundi 17 avril 2017 - 22:55:05

Fichier

Lab_Lab_Port.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01134598, version 1

Citation

Cyril Labbé, Dominique Labbé, François Portet. Detection of computer generated papers in scientific literature. Mirko Degli Esposti; Eduardo G. Altmann; François Pachet. Creativity and Universality in Language, 2016. 〈hal-01134598〉

Partager

Métriques

Consultations de la notice

810

Téléchargements de fichiers

301