Corpus base linguistic exploration via forced alignments with a ‘light-weight’ ASR tool - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2017

Corpus base linguistic exploration via forced alignments with a ‘light-weight’ ASR tool

Résumé

In this work we make use of a baseline ASR system developed for a speech corpus of Embosi (Bantu C25)—a less-resourced language. A first version of this system has been used as a light weight ASR tool to produce forced alignments with the aim of carrying out corpus based linguistic studies. Several linguistic studies of Embosi have identified the deletion of associative morphemes and vowel elision as outstanding issues for further research. We show empirical evidence derived from the Embosi speech corpus that the deletion of these morphemes is not observed equally across all classes, but that there are systematic differences in the occurrence of the associative class morphemes being deleted. We also observe from the corpus that vowel elision interacts with the deletion of these morphemes. We show that with limited language resources, linguistic analysis on less-resourced languages can be accomplished using simple/light-weight models on small speech corpora.
Fichier non déposé

Dates et versions

hal-01837174 , version 1 (12-07-2018)

Identifiants

  • HAL Id : hal-01837174 , version 1

Citer

Jamison Cooper-Leavitt, Lori Lamel, Annie Rialland, Martine Adda-Decker, Gilles Adda. Corpus base linguistic exploration via forced alignments with a ‘light-weight’ ASR tool. Language & Technology Conference : Human Language Technologies as a Challenge for Computer Science and Linguistics, Nov 2017, Poznań, Poland. ⟨hal-01837174⟩
45 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More