Statistical Parsing of Spanish and Data Driven Lemmatization - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2012

Statistical Parsing of Spanish and Data Driven Lemmatization

Résumé

Although parsing performances have greatly improved in the last years, grammar inference from treebanks for morphologically rich lan- guages, especially from small treebanks, is still a challenging task. In this paper we in- vestigate how state-of-the-art parsing perfor- mances can be achieved on Spanish, a lan- guage with a rich verbal morphology, with a non-lexicalized parser trained on a treebank containing only around 2,800 trees. We rely on accurate part-of-speech tagging and data- driven lemmatization in order to cope with lexical data sparseness. Providing state-of- the-art results on Spanish, our methodology is applicable to other languages.
Fichier principal
Vignette du fichier
SPMRL2012_Spanish.pdf (113.76 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00702496 , version 1 (30-05-2012)

Identifiants

  • HAL Id : hal-00702496 , version 1

Citer

Joseph Le Roux, Benoît Sagot, Djamé Seddah. Statistical Parsing of Spanish and Data Driven Lemmatization. ACL 2012 Joint Workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages (SP-Sem-MRL 2012), Jul 2012, Jeju, South Korea. 6 p. ⟨hal-00702496⟩
288 Consultations
242 Téléchargements

Partager

Gmail Facebook X LinkedIn More