Benchmarking benchmarks: introducing new automatic indicators for benchmarking Spoken Language Understanding corpora

Frédéric Béchet; Christian Raymond

Communication Dans Un Congrès Année : 2019

Benchmarking benchmarks: introducing new automatic indicators for benchmarking Spoken Language Understanding corpora

(1) , (2)

1
2

Frédéric Béchet

Fonction : Auteur
PersonId : 12253
IdHAL : frederic-bechet
IdRef : 070531730

Traitement Automatique du Langage Ecrit et Parlé

Christian Raymond

Fonction : Auteur
PersonId : 1778
IdHAL : christian-raymond
IdRef : 099236486

Institut National des Sciences Appliquées - Rennes

Résumé

Empirical evaluation is nowadays the main evaluation paradigm in Natural Language Processing for assessing the relevance of a new machine-learning based model. If large corpora are available for tasks such as Automatic Speech Recognition , this is not the case for other tasks such as Spoken Language Understanding (SLU), consisting in translating spoken transcriptions into a formal representation often based on semantic frames. Corpora such as ATIS or SNIPS are widely used to compare systems, however differences in performance among systems are often very small, not statistically significant , and can be produced by biases in the data collection or the annotation scheme, as we presented on the ATIS corpus ("Is ATIS too shallow?, IS2018"). We propose in this study a new methodology for assessing the relevance of an SLU corpus. We claim that only taking into account systems performance does not provide enough insight about what is covered by current state-of-the-art models and what is left to be done. We apply our methodology on a set of 4 SLU systems and 5 benchmark corpora (ATIS, SNIPS, M2M, MEDIA) and automatically produce several indicators assessing the relevance (or not) of each corpus for benchmarking SLU models.

Mots clés

Spoken Language Understanding (SLU) benchmark ATIS SNIPS M2M MEDIA

Domaines

Informatique et langage [cs.CL] Apprentissage [cs.LG] Réseau de neurones [cs.NE] Machine Learning [stat.ML]

Fichier principal

Interspeech2019.pdf (175.48 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Christian Raymond : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02270633

Soumis le : lundi 26 août 2019-10:17:16

Dernière modification le : vendredi 22 mars 2024-18:24:04

Archivage à long terme le : vendredi 10 janvier 2020-22:34:12

Dates et versions

hal-02270633 , version 1 (26-08-2019)

Identifiants

HAL Id : hal-02270633 , version 1

Citer

Frédéric Béchet, Christian Raymond. Benchmarking benchmarks: introducing new automatic indicators for benchmarking Spoken Language Understanding corpora. InterSpeech, Sep 2019, Graz, Austria. ⟨hal-02270633⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 UNIV-TLN CNRS UNIV-AMU INSA-RENNES IRISA IRISA-INSA-R UR1-MATH-STIC UR1-UFR-ISTIC LIS-LAB UNIV-RENNES INSA-GROUPE UR1-MATH-NUM

371 Consultations

412 Téléchargements

Benchmarking benchmarks: introducing new automatic indicators for benchmarking Spoken Language Understanding corpora

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager