ASR PERFORMANCE PREDICTION ON UNSEEN BROADCAST PROGRAMS USING CONVOLUTIONAL NEURAL NETWORKS

Abstract : In this paper, we address a relatively new task: prediction of ASR performance on unseen broadcast programs. We first propose an heterogenous French corpus dedicated to this task. Two prediction approaches are compared: a state-of-the-art performance prediction based on regression (engineered features) and a new strategy based on convolutional neural networks (learnt features). We particularly focus on the combination of both textual (ASR transcription) and signal inputs. While the joint use of textual and signal features did not work for the regression baseline, the combination of inputs for CNNs leads to the best WER prediction performance. We also show that our CNN prediction remarkably predicts the WER distribution on a collection of speech recordings.
Type de document :
Communication dans un congrès
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 2018, Calgary, Alberta, Canada
Liste complète des métadonnées

Littérature citée [25 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01709779
Contributeur : Laurent Besacier <>
Soumis le : jeudi 15 février 2018 - 11:43:02
Dernière modification le : jeudi 11 octobre 2018 - 08:48:03
Document(s) archivé(s) le : dimanche 6 mai 2018 - 03:55:14

Fichier

20180214040820_904429_2819.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01709779, version 1

Citation

Zied Elloumi, Laurent Besacier, Olivier Galibert, Juliette Kahn, Benjamin Lecouteux. ASR PERFORMANCE PREDICTION ON UNSEEN BROADCAST PROGRAMS USING CONVOLUTIONAL NEURAL NETWORKS. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 2018, Calgary, Alberta, Canada. 〈hal-01709779〉

Partager

Métriques

Consultations de la notice

471

Téléchargements de fichiers

198