Automatic recognition of French Cued Speech using multimodal fusion based on hidden Markov models
Résumé
In this article, automatic recognition of French Cued Speech based on hidden Markov models (HMM) is presented. Cued Speech is a visual system which uses handshapes in different positions and in combination with lip-patterns of speech, and makes all the sounds of spoken language clearly understandable to deaf and hearing-impaired people. The aim of Cued Speech is to overcome the problems of lipreading and thus enable deaf children and adults to understand full spoken language. In automatic recognition of Cued Speech, lip shape and gesture recognition are required. In addition, the integration of the two modalities is of the greatest importance. In this study, lip shape component is fused with gestures components to realize Cued Speech recognition. Using concatenative feature fusion and multi-stream HMM decision fusion, vowel recognition and consonant recognition experiments have been conducted. For vowel recognition, an 87.6% vowel accuracy was obtained showing a 61.3% relative improvement compared to the sole use of lip shape parameters. In the case of consonant recognition, a 78.9% accuracy was obtained showing a 56% relative improvement compared with the use of lip shape only. In addition to vowel and consonant recognition, a complete phoneme recognition experiment using concatenated feature vectors and Gaussian mixture model (GMM) discrimination has been conducted showing a 74.4% phoneme accuracy. The obtained results were compared to the results obtained using the audio signal showing comparable accuracies. The achieved results show the effectiveness of the proposed approaches as far as Cued Speech recognition is concerned.
Origine : Fichiers produits par l'(les) auteur(s)