Language and Variety Verification on Broadcast News for Portuguese - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Speech Communication, special issue on Iberian languages Année : 2008

Language and Variety Verification on Broadcast News for Portuguese

Résumé

This paper describes a language/accent verification system for Portuguese, that explores different type of properties: acoustic, phonotactic and prosodic. The two-stage system is designed to be used as a pre-processing module for the Portuguese Automatic Speech Recognition (ASR) system developed at INESC-ID. As the ASR system is applied everyday to transcribe the evening news from a Portuguese public TV channel, the presence of other languages (mainly English) and other varieties of Portuguese is very likely. In the first stage, for each automatically detected speaker, the system verifies if the spoken language is Portuguese, as opposed to nine other languages -- English, Belgian Dutch, Croatian, Czech, Galician, Greek, Hungarian, Sloven and Slovak. The identified Portuguese speakers are then fed to the second stage which aims at identifying the Portuguese variety: European, Brazilian or African Portuguese from 5 countries. The identification results are then used either to mark the speech data as untranscribable or forward it to the European Portuguese ASR system, or a system tuned for other languages or varieties. The language verification system achieved an equal error rate for European Portuguese of 2.5%. In terms of variety identification, the overall rate of correct identification was 69.0% if all 7 varieties are considered, and the best results were obtained for Brazilian Portuguese, also the variety that proved easiest to identify in perceptual experiments. If all African varieties are merged into a single broad class, the identification rate goes up to 94.7%. The fact that the prosodic system alone can achieve an identification rate of 77% is also worth investigating.
Fichier principal
Vignette du fichier
SPECOM-S-07-00107.pdf (1.45 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00664601 , version 1 (31-01-2012)

Identifiants

  • HAL Id : hal-00664601 , version 1

Citer

Jean-Luc Rouas, Isabel Trancoso, Céu Viana, Mónica Abreu. Language and Variety Verification on Broadcast News for Portuguese. Speech Communication, special issue on Iberian languages, 2008, 50 (11-12), pp.965-979. ⟨hal-00664601⟩
168 Consultations
996 Téléchargements

Partager

Gmail Facebook X LinkedIn More