A Hybrid Model for Urdu Hindi Transliteration

We report in this paper a novel hybrid ap- proach for Urdu to Hindi transliteration that combines finite-state machine (FSM) based techniques with statistical word language model based approach. The output from the FSM is filtered with the word language model to produce the correct Hindi output. The main problem handled is the case of omission of di- acritical marks from the input Urdu text. Our system produces the correct Hindi output even when the crucial information in the form of di- acritic marks is absent. The approach improves the accuracy of the transducer-only approach from 50.7% to 79.1%. The results reported show that performance can be improved using a word language model to disambiguate the output produced by the transducer-only ap- proach, especially when diacritic marks are not present in the Urdu input.

Domaines

Traitement du texte et du document

Fichier principal

W09-3536.pdf (387.2 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Didier Schwab : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01002165

Soumis le : mardi 16 janvier 2018-15:33:25

Dernière modification le : jeudi 4 avril 2024-18:19:03

Archivage à long terme le : mardi 17 avril 2018-12:00:24

Dates et versions

hal-01002165 , version 1 (16-01-2018)

Identifiants

HAL Id : hal-01002165 , version 1

Citer

Muhammad Ghulam Abbas Malik, Laurent Besacier, Christian Boitet, Pushpak Bhattacharyya. A Hybrid Model for Urdu Hindi Transliteration. Joint conference of the 47th Annual Meeting of the Association of Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of NLP ACL/IJCNLP Workshop on Named Entities (NEWS-09), Aug 2009, Manchester, United Kingdom. pp.177-185. ⟨hal-01002165⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS LIG LIG_TDCGE_GETALP POLYTECH-GRENOBLE LIG_SIDCH

152 Consultations

127 Téléchargements