Statistical Parsing of Spanish and Data Driven Lemmatization

Abstract : Although parsing performances have greatly improved in the last years, grammar inference from treebanks for morphologically rich lan- guages, especially from small treebanks, is still a challenging task. In this paper we in- vestigate how state-of-the-art parsing perfor- mances can be achieved on Spanish, a lan- guage with a rich verbal morphology, with a non-lexicalized parser trained on a treebank containing only around 2,800 trees. We rely on accurate part-of-speech tagging and data- driven lemmatization in order to cope with lexical data sparseness. Providing state-of- the-art results on Spanish, our methodology is applicable to other languages.
Document type :
Conference papers
Complete list of metadatas

Cited literature [15 references]  Display  Hide  Download
Contributor : Joseph Le Roux <>
Submitted on : Wednesday, May 30, 2012 - 2:12:35 PM
Last modification on : Friday, May 3, 2019 - 1:41:45 AM
Long-term archiving on : Thursday, December 15, 2016 - 9:30:27 AM


Files produced by the author(s)


  • HAL Id : hal-00702496, version 1


Joseph Le Roux, Benoît Sagot, Djamé Seddah. Statistical Parsing of Spanish and Data Driven Lemmatization. ACL 2012 Joint Workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages (SP-Sem-MRL 2012), Jul 2012, Jeju, South Korea. 6 p. ⟨hal-00702496⟩



Record views


Files downloads