Abstract : Our work deals with the classical problem of merging heterogenous and asynchronous parameters. It's well known that lips reading improves the speech recognition score, specially in noise condition ; so we study more precisely the modeling of acoustic and labial parameters to propose two Automatic Speech Recognition Systems:
a Direct Identification is performed by using a classical HMM approach: no correlation between visual and acoustic parameters is assumed.
two correlated models : a master HMM and a slave HMM, process respectively the labial observations and the acoustic To assess each approach, we use a segmental pre-processing and an acoustic robust elementary unit "the pseudodiphone".
Our task is the recognition of spelled french letters, in clear and noisy ( cocktail party ) environments. Whatever the approach and condition, the introduction of labial features improves the performances, but the difference between the two models isn't enough sufficient to provide any priority.