Corpus base linguistic exploration via forced alignments with a light-weight ASR tool
Résumé
In this work we make use of a baseline ASR system developed for a speech corpus of Embosi (Bantu C25)a less-resourced language. A first version of this system has been used as a light weight ASR tool to produce forced alignments with the aim of carrying out corpus based linguistic studies. Several linguistic studies of Embosi have identified the deletion of associative morphemes and vowel elision as outstanding issues for further research. We show empirical evidence derived from the Embosi speech corpus that the deletion of these morphemes is not observed equally across all classes, but that there are systematic differences in the occurrence of the associative class
morphemes being deleted. We also observe from the corpus that vowel elision interacts with the deletion of these morphemes. We show that with limited language resources, linguistic analysis on less-resourced languages can be accomplished using simple/light-weight models on small speech corpora.