Curious Cases of Automatically Generated Text and Detecting Probabilistic Context Free Grammar Sentences with Grammatical Structure Similarity - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2017

Curious Cases of Automatically Generated Text and Detecting Probabilistic Context Free Grammar Sentences with Grammatical Structure Similarity

Résumé

Automatically generated papers have been used to manipulate bibliography indexes on numerous occasions. This paper is interested in different means to generate texts such as by a recurrent neural network, a Markov model, or a probabilistic context free grammar and if it is possible to detect them using a current approach. Then, probabilistic context free grammar (PCFG) is focused on as the one most used. However, despite that there have been multiple approaches to detecting such types of paper. Yet, they are all working at the document level and are unable to detect a small amount of generated text inside a larger body of genuinely written text. Thus, we present the Grammatical Structure Similarity (GSS) measurement to detect sentences or short fragments of automatically generated text from known PCFG generators. The proposed approach is tested against a pattern checker and various common machine learning methods. Additionally, the ability to detect a modified generator is also tested.
Fichier principal
Vignette du fichier
newsource.pdf (1.61 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01654726 , version 1 (04-12-2017)

Identifiants

  • HAL Id : hal-01654726 , version 1

Citer

Nguyen Minh Tien, Cyril Labbé. Curious Cases of Automatically Generated Text and Detecting Probabilistic Context Free Grammar Sentences with Grammatical Structure Similarity. Proceedings of the Fifth Workshop on Bibliometric-enhanced Information Retrieval (BIR) co-located with the 39th European Conference on Information Retrieval (ECIR 2017), Apr 2017, Aberdeen, United Kingdom. ⟨hal-01654726⟩
203 Consultations
277 Téléchargements

Partager

Gmail Facebook X LinkedIn More