A Multilinear Tongue Model Derived from Speech Related MRI Data of the Human Vocal Tract

We present a multilinear statistical model of the human tongue that captures anatomical and tongue pose related shape variations separately. The model is derived from 3D magnetic resonance imaging data of 11 speakers sustaining speech related vocal tract configurations. To extract model parameters, we use a minimally supervised method based on an image segmentation approach and a template fitting technique. Furthermore, we use image denoising to deal with possibly corrupt data, palate surface information reconstruction to handle palatal tongue contacts, and a bootstrap strategy to refine the obtained shapes. Our evaluation shows that, by limiting the degrees of freedom for the anatomical and speech related variations, to 5 and 4, respectively, we obtain a model that can reliably register unknown data while avoiding overfitting effects. Furthermore, we show that it can be used to generate plausible tongue animation by tracking sparse motion capture data.

Mots clés

vocal tract MRI tongue shape analysis statistical model

Domaines

Vision par ordinateur et reconnaissance de formes [cs.CV]

Fichier principal

article.pdf (2.94 Mo)

anc/01MRIM_projection_68.mp4 (510.82 Ko)

anc/01MRIM_projection_69.mp4 (512.59 Ko)

anc/01MRIM_projection_70.mp4 (515.33 Ko)

anc/01MRIM_projection_73.mp4 (511.8 Ko)

anc/01MRIM_projection_74.mp4 (516.94 Ko)

anc/01MRIM_projection_76.mp4 (512.42 Ko)

anc/01MRIM_projection_77.mp4 (511.98 Ko)

anc/VP05_fixed_anatomy.mp4 (11.04 Mo)

anc/VP05_full.mp4 (10.37 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Ingmar Steiner : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01418460

Soumis le : samedi 14 avril 2018-16:02:04

Dernière modification le : jeudi 4 avril 2024-21:20:49

Dates et versions

hal-01418460 , version 1 (16-12-2016)

hal-01418460 , version 2 (14-04-2018)

Identifiants

HAL Id : hal-01418460 , version 2
ARXIV : 1612.05005
DOI : 10.1016/j.csl.2018.02.001

Citer

Alexander Hewer, Stefanie Wuhrer, Ingmar Steiner, Korin Richmond. A Multilinear Tongue Model Derived from Speech Related MRI Data of the Human Vocal Tract. Computer Speech and Language, 2018, 51, pp.68-92. ⟨10.1016/j.csl.2018.02.001⟩. ⟨hal-01418460v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS INRIA LJK LJK_GI LJK_GI_MORPHEO INRIA2

411 Consultations

460 Téléchargements