Handling Unknown Words in Statistical Latent-Variable Parsing Models for Arabic, English and French - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2010

Handling Unknown Words in Statistical Latent-Variable Parsing Models for Arabic, English and French

Résumé

This paper presents a study of the impact of using simple and complex morphological clues to improve the classification of rare and unknown words for parsing. We compare this approach to a language-independent technique often used in parsers which is based solely on word frequencies. This study is ap- plied to three languages that exhibit different levels of morphological expressiveness: Arabic, French and English. We integrate infor- mation about Arabic affixes and morphotac- tics into a PCFG-LA parser and obtain state- of-the-art accuracy. We also show that these morphological clues can be learnt automati- cally from an annotated corpus.

Mots clés

Fichier principal
Vignette du fichier
spmrl2010.pdf (101.79 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00702414 , version 1 (30-05-2012)

Identifiants

  • HAL Id : hal-00702414 , version 1

Citer

Mohammed Attia, Jennifer Foster, Deirdre Hogan, Joseph Le Roux, Lamia Tounsi, et al.. Handling Unknown Words in Statistical Latent-Variable Parsing Models for Arabic, English and French. First Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL 2010), 2010, United States. pp.67-75. ⟨hal-00702414⟩
85 Consultations
206 Téléchargements

Partager

Gmail Facebook X LinkedIn More