Controlled generation of synthetic corpora for NLP evaluation - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2015

Controlled generation of synthetic corpora for NLP evaluation

Résumé

Automatic processing is mandatory to build a global and fair view of opinions and sentiments expressed on the web through comments and reviews. Various Extracting Tools (ETs) exists to automatically analyse comments and reviews; however checking the accuracy of such tools remain quite challenging. We propose a new approach for that purpose. The main idea is to use a data-to-text approach to generate a synthetic corpus which can be used to validate ETs. The data represent what has to be said in which proportion about something (i.e: 45% of the review says the room is small). A set of reviews (the synthetic corpus) is then generated and the correctness of an ET can then be assessed in regards to its fairness regarding the original data.
Fichier principal
Vignette du fichier
document.pdf (82.02 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03177929 , version 1 (23-03-2021)

Identifiants

  • HAL Id : hal-03177929 , version 1

Citer

Jérémie Démarchez, Cyril Labbé. Controlled generation of synthetic corpora for NLP evaluation. 1st Workshop on Data-to-text Generation, Mar 2015, Edinburgh, United Kingdom. ⟨hal-03177929⟩
54 Consultations
18 Téléchargements

Partager

Gmail Facebook X LinkedIn More