Detection of Reformulations in Spoken French

Abstract : Our work addresses automatic detection of enunciations and segments with reformulations in French spoken corpora. The proposed approach is syntagmatic. It is based on reformulation markers and specificities of spoken language. The reference data are built manually and have gone through consensus. Automatic methods, based on rules and CRF machine learning, are proposed in order to detect the enunciations and segments that contain reformulations. With the CRF models, different features are exploited within a window of various sizes. Detection of enunciations with reformulations shows up to 0.66 precision. The tests performed for the detection of reformulated segments indicate that the task remains difficult. The best average performance values reach up to 0.65 F-measure, 0.75 precision, and 0.63 recall. We have several perspectives to this work for improving the detection of reformulated segments and for studying the data from other points of view.
Type de document :
Communication dans un congrès
LREC (Language Resources and Evaluation Conference) 2016, May 2016, Portorož, Slovenia
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01426788
Contributeur : Natalia Grabar <>
Soumis le : mercredi 4 janvier 2017 - 21:45:42
Dernière modification le : mardi 3 juillet 2018 - 11:47:24
Document(s) archivé(s) le : mercredi 5 avril 2017 - 15:30:16

Fichier

grabar-LREC2016reform.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01426788, version 1

Collections

Citation

Natalia Grabar, Iris Eshkol-Taravela. Detection of Reformulations in Spoken French. LREC (Language Resources and Evaluation Conference) 2016, May 2016, Portorož, Slovenia. 〈hal-01426788〉

Partager

Métriques

Consultations de la notice

149

Téléchargements de fichiers

102