Variable-length class sequences based on a hierarchical approach: MCnv

Imed Zitouni 1 Kamel Smaïli 2 Jean-Paul Haton 3
2 SMarT - Statistical Machine Translation and Speech Modelization and Text
LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
3 PAROLE - Analysis, perception and recognition of speech
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : In contrast to conventional n-gram approaches, which are the most used language model in continuous speech recognition system, the multigram approach models a stream of variable-length sequences. Motivated by the success of class based methods in language modeling, we explore their potential use in a multigram framework. To overcome the independence assumption in classical multigram, we propose in this paper a hierarchical model which successively relaxes this assumption. We called this model: MCnv. The estimation of the model parameters can be formulated as a Maximum Likelihood estimation problem from incomplete data used at different levels (j € {1..v}). We show that estimates of the model parameters can be computed through an iterative Expectation-Maximization algorithm. A few experimental tests were carried out on a class corpus extracted from the French "Le Monde" word corpus labeled automatically. Results show that MCnv outperforms based class multigram and interpolated class trigram model.
Document type :
Conference papers
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01112917
Contributor : Kamel Smaïli <>
Submitted on : Tuesday, February 3, 2015 - 7:41:21 PM
Last modification on : Tuesday, December 18, 2018 - 4:38:02 PM

Identifiers

  • HAL Id : hal-01112917, version 1

Collections

Citation

Imed Zitouni, Kamel Smaïli, Jean-Paul Haton. Variable-length class sequences based on a hierarchical approach: MCnv. SPECOM - Proceedings of the International Workshop on speech and communication, 1998, Saint-Petersbourg, Russia. ⟨hal-01112917⟩

Share

Metrics

Record views

231