End-to-end model for named entity recognition from speech without paired training data

Salima Mdhaffar; Jarod Duret; Titouan Parcollet; Yannick Estève

Communication Dans Un Congrès Année : 2022

End-to-end model for named entity recognition from speech without paired training data

, , , (1)

Salima Mdhaffar

Fonction : Auteur

Jarod Duret

Fonction : Auteur

Titouan Parcollet

Fonction : Auteur

Yannick Estève

Fonction : Auteur
PersonId : 11645
IdHAL : yannick-esteve
ORCID : 0000-0002-3656-8883
IdRef : 070531668

Laboratoire Informatique d'Avignon

Résumé

Recent works showed that end-to-end neural approaches tend to become very popular for spoken language understanding (SLU). Through the term end-to-end, one considers the use of a single model optimized to extract semantic information directly from the speech signal. A major issue for such models is the lack of paired audio and textual data with semantic annotation. In this paper, we propose an approach to build an end-to-end neural model to extract semantic information in a scenario in which zero paired audio data is available. Our approach is based on the use of an external model trained to generate a sequence of vectorial representations from text. These representations mimic the hidden representations that could be generated inside an end-to-end automatic speech recognition (ASR) model by processing a speech signal. A SLU neural module is then trained using these representations as input and the annotated text as output. Last, the SLU module replaces the top layers of the ASR model to achieve the construction of the end-to-end model. Our experiments on named entity recognition, carried out on the QUAERO corpus, show that this approach is very promising, getting better results than a comparable cascade approach or than the use of synthetic voices.

Mots clés

Low resource spoken language understanding End to end neural model Named Entity Recognition

Domaines

Informatique et langage [cs.CL]

Fichier principal

IS22___Textual_injection_for_e2e_NER-3.pdf (652.13 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Yannick Estève : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03701145

Soumis le : mardi 21 juin 2022-16:49:05

Dernière modification le : mardi 16 janvier 2024-16:28:21

Archivage à long terme le : jeudi 22 septembre 2022-19:52:31

Dates et versions

hal-03701145 , version 1 (21-06-2022)

Identifiants

HAL Id : hal-03701145 , version 1

Citer

Salima Mdhaffar, Jarod Duret, Titouan Parcollet, Yannick Estève. End-to-end model for named entity recognition from speech without paired training data. Interspeech 2022, Sep 2022, Incheon, South Korea. ⟨hal-03701145⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-AVIGNON GENCI LIA

49 Consultations

184 Téléchargements

End-to-end model for named entity recognition from speech without paired training data

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager