Using the FASST source separation toolbox for noise robust speech recognition

Alexey Ozerov; Emmanuel Vincent

Communication Dans Un Congrès Année : 2011

Using the FASST source separation toolbox for noise robust speech recognition

(1) , (1)

Alexey Ozerov

Fonction : Auteur
PersonId : 882775

Speech and sound data modeling and processing

Emmanuel Vincent

Fonction : Auteur
PersonId : 1256
IdHAL : emmanuelv
ORCID : 0000-0002-0183-7289
IdRef : 089360176

Speech and sound data modeling and processing

Résumé

We describe our submission to the 2011 CHiME Speech Separation and Recognition Challenge. Our speech separation algorithm was built using the Flexible Audio Source Separation Toolbox (FASST) we developed recently. This toolbox is an implementation of a general flexible framework based on a library of structured source models that enable the incorporation of prior knowledge about a source separation problem via user-specifiable constraints. We show how to use FASST to develop an efficient speech separation algorithm for the CHiME dataset. We also describe the acoustic model training and adaptation strategies we used for this submission. Altogether, as compared to the baseline system, we obtain an improvement of keyword recognition accuracies in all conditions. The best improvement of about 40 % is achieved in the worst condition of -6 dB Signal-to-Noise-Ratio (SNR), where 18 % of this improvement is due to the speech separation. The improvement decreases when the SNR increases. These results indicate that audio source separation can be very helpful to improve speech recognition in noisy or multi-source environments.

Domaines

Traitement du signal et de l'image [eess.SP] Traitement du signal et de l'image [eess.SP]

Fichier principal

CHiME_submission_v4.pdf (104.57 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Alexey Ozerov : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00598734

Soumis le : mardi 7 juin 2011-14:26:46

Dernière modification le : vendredi 24 mars 2023-14:52:54

Archivage à long terme le : vendredi 9 septembre 2011-15:17:12

Dates et versions

inria-00598734 , version 1 (07-06-2011)

Identifiants

HAL Id : inria-00598734 , version 1

Citer

Alexey Ozerov, Emmanuel Vincent. Using the FASST source separation toolbox for noise robust speech recognition. International Workshop on Machine Listening in Multisource Environments (CHiME 2011), Sep 2011, Florence, Italy. ⟨inria-00598734⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

EC-PARIS UNIV-RENNES1 CNRS INRIA INSA-RENNES IRISA IRISA-D5 INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES INSA-GROUPE UR1-MATH-NUM

226 Consultations

278 Téléchargements

Using the FASST source separation toolbox for noise robust speech recognition

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager