Generation of Company descriptions using concept-to-text and text-to-text deep models: dataset collection and systems evaluation

Abstract : In this paper we study the performance of several state-of-the-art sequence-to-sequence models applied to generation of short company descriptions. The models are evaluated on a newly created and publicly available company dataset that has been collected from Wikipedia. The dataset consists of around 51K company descriptions that can be used for both concept-to-text and text-to-text generation tasks. Automatic metrics and human evaluation scores computed on the generated company descriptions show promising results despite the difficulty of the task as the dataset (like most available datasets) has not been originally designed for machine learning. In addition, we perform correlation analysis between automatic metrics and human evaluations and show that certain automatic metrics are more correlated to human judgments.
Type de document :
Communication dans un congrès
11th International Conference on Natural Language Generation, Nov 2018, Tilburg, Netherlands
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01950467
Contributeur : François Portet <>
Soumis le : lundi 10 décembre 2018 - 20:34:59
Dernière modification le : lundi 11 février 2019 - 16:36:02

Fichier

inlg2018-companies.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01950467, version 1

Citation

Raheel Qader, Khoder Jneid, François Portet, Cyril Labbé. Generation of Company descriptions using concept-to-text and text-to-text deep models: dataset collection and systems evaluation. 11th International Conference on Natural Language Generation, Nov 2018, Tilburg, Netherlands. 〈hal-01950467〉

Partager

Métriques

Consultations de la notice

19

Téléchargements de fichiers

20