Generative and Discriminative Methods using Morphological Information for Sentence Segmentation of Turkish - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue IEEE Transactions on Audio, Speech and Language Processing Année : 2009

Generative and Discriminative Methods using Morphological Information for Sentence Segmentation of Turkish

Résumé

This paper presents novel methods for generative, discriminative, and hybrid sequence classification for segmentation of Turkish utterances into sentences. In the literature, this task is generally solved using statistical models that take advantage of lexical information among others. However, Turkish has a productive morphology that generates an exponential vocabulary size, harming language models such as the established hidden event language model (HELM). We extend this model as a factored hidden event language model (fHELM) in order to take advantage of morphologically informed features in addition to the word sequence. Our results indicate that fHELMs result in a 26% reduction in error rate for Turkish broadcast news. Combining lexical, morphological, and prosodic information using these new models and discriminative classifiers (boosting and conditional random fields) results in significant performance improvements over any of the classifiers alone.
Fichier non déposé

Dates et versions

hal-00447936 , version 1 (17-01-2010)

Identifiants

  • HAL Id : hal-00447936 , version 1

Citer

Guz Umit, Favre Benoit, Hakkani-Tür Dilek, Tur Gokhan. Generative and Discriminative Methods using Morphological Information for Sentence Segmentation of Turkish. IEEE Transactions on Audio, Speech and Language Processing, 2009, 17 (5), pp.295-903. ⟨hal-00447936⟩
10 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More