ASR performance prediction on unseen broadcast programs using convolutional neurol networks

In this paper, we address a relatively new task: prediction of ASR performance on unseen broadcast programs. We first propose an heterogenous French corpus dedicated to this task. Two prediction approaches are compared: a state-of-the-art performance prediction based on regression (engineered features) and a new strategy based on convolutional neural networks (learnt features). We particularly focus on the combination of both textual (ASR transcription) and signal inputs. While the joint use of textual and signal features did not work for the regression baseline, the combination of inputs for CNNs leads to the best WER prediction performance. We also show that our CNN prediction remarkably predicts the WER distribution on a collection of speech recordings.

Mots clés

Convolutional Neural Networks Performance Prediction Large Vocabulary Continuous Speech Recognition

Domaines

Informatique et langage [cs.CL]

Fichier principal

20180214040820_904429_2819.pdf (346.84 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Laurent Besacier : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01709779

Soumis le : jeudi 15 février 2018-11:43:02

Dernière modification le : lundi 15 avril 2024-11:25:23

Archivage à long terme le : dimanche 6 mai 2018-03:55:14

Dates et versions

hal-01709779 , version 1 (15-02-2018)

Identifiants

HAL Id : hal-01709779 , version 1

Citer

Zied Elloumi, Laurent Besacier, Olivier Galibert, Juliette Kahn, Benjamin Lecouteux. ASR performance prediction on unseen broadcast programs using convolutional neurol networks. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 2018, Calgary, Alberta, Canada. ⟨hal-01709779⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS CNAM LIG LIG_TDCGE_GETALP PERSYVAL-LAB LNE POLYTECH-GRENOBLE LNE-CNAM ANR LIG_SIDCH HESAM

383 Consultations

280 Téléchargements