A Hybrid Model for Urdu Hindi Transliteration

Abstract : We report in this paper a novel hybrid ap- proach for Urdu to Hindi transliteration that combines finite-state machine (FSM) based techniques with statistical word language model based approach. The output from the FSM is filtered with the word language model to produce the correct Hindi output. The main problem handled is the case of omission of di- acritical marks from the input Urdu text. Our system produces the correct Hindi output even when the crucial information in the form of di- acritic marks is absent. The approach improves the accuracy of the transducer-only approach from 50.7% to 79.1%. The results reported show that performance can be improved using a word language model to disambiguate the output produced by the transducer-only ap- proach, especially when diacritic marks are not present in the Urdu input.
Document type :
Conference papers
Complete list of metadatas

Cited literature [15 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01002165
Contributor : Didier Schwab <>
Submitted on : Tuesday, January 16, 2018 - 3:33:25 PM
Last modification on : Monday, July 8, 2019 - 3:08:13 PM
Long-term archiving on : Tuesday, April 17, 2018 - 12:00:24 PM

File

W09-3536.pdf
Publisher files allowed on an open archive

Identifiers

  • HAL Id : hal-01002165, version 1

Citation

Muhammad Ghulam Abbas Malik, Laurent Besacier, Christian Boitet, Pushpak Bhattacharyya. A Hybrid Model for Urdu Hindi Transliteration. Joint conference of the 47th Annual Meeting of the Association of Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of NLP ACL/IJCNLP Workshop on Named Entities (NEWS-09), Aug 2009, Manchester, United Kingdom. pp.177-185. ⟨hal-01002165⟩

Share

Metrics

Record views

103

Files downloads

131