Statistical Parsing of Spanish and Data Driven Lemmatization

Abstract : Although parsing performances have greatly improved in the last years, grammar inference from treebanks for morphologically rich lan- guages, especially from small treebanks, is still a challenging task. In this paper we in- vestigate how state-of-the-art parsing perfor- mances can be achieved on Spanish, a lan- guage with a rich verbal morphology, with a non-lexicalized parser trained on a treebank containing only around 2,800 trees. We rely on accurate part-of-speech tagging and data- driven lemmatization in order to cope with lexical data sparseness. Providing state-of- the-art results on Spanish, our methodology is applicable to other languages.
Document type :
Conference papers
Complete list of metadatas

Cited literature [15 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-00702496
Contributor : Joseph Le Roux <>
Submitted on : Wednesday, May 30, 2012 - 2:12:35 PM
Last modification on : Friday, May 3, 2019 - 1:41:45 AM
Long-term archiving on : Thursday, December 15, 2016 - 9:30:27 AM

File

SPMRL2012_Spanish.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00702496, version 1

Citation

Joseph Le Roux, Benoît Sagot, Djamé Seddah. Statistical Parsing of Spanish and Data Driven Lemmatization. ACL 2012 Joint Workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages (SP-Sem-MRL 2012), Jul 2012, Jeju, South Korea. 6 p. ⟨hal-00702496⟩

Share

Metrics

Record views

465

Files downloads

383