Statistical Parsing of Spanish and Data Driven Lemmatization

Joseph Le Roux; Benoît Sagot; Djamé Seddah

Communication Dans Un Congrès Année : 2012

Statistical Parsing of Spanish and Data Driven Lemmatization

(1) , (2) , (2, 3)

1
2
3

Joseph Le Roux

Fonction : Auteur
PersonId : 1192450
IdHAL : joseph-le-roux
ORCID : 0000-0002-3889-8536

Laboratoire d'Informatique de Paris-Nord

Benoît Sagot

Fonction : Auteur
PersonId : 1461
IdHAL : bsagot
ORCID : 0000-0002-0107-8526
IdRef : 177454229

Analyse Linguistique Profonde à Grande Echelle ; Large-scale deep linguistic processing

Djamé Seddah

Fonction : Auteur
PersonId : 11545
IdHAL : djameseddah
IdRef : 086185136

Analyse Linguistique Profonde à Grande Echelle ; Large-scale deep linguistic processing

Institut des Sciences Humaines Appliquées

Résumé

Although parsing performances have greatly improved in the last years, grammar inference from treebanks for morphologically rich lan- guages, especially from small treebanks, is still a challenging task. In this paper we in- vestigate how state-of-the-art parsing perfor- mances can be achieved on Spanish, a lan- guage with a rich verbal morphology, with a non-lexicalized parser trained on a treebank containing only around 2,800 trees. We rely on accurate part-of-speech tagging and data- driven lemmatization in order to cope with lexical data sparseness. Providing state-of- the-art results on Spanish, our methodology is applicable to other languages.

Domaines

Informatique et langage [cs.CL]

Fichier principal

SPMRL2012_Spanish.pdf (113.76 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Joseph Le Roux : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00702496

Soumis le : mercredi 30 mai 2012-14:12:35

Dernière modification le : vendredi 24 mars 2023-14:52:55

Archivage à long terme le : jeudi 15 décembre 2016-09:30:27

Dates et versions

hal-00702496 , version 1 (30-05-2012)

Identifiants

HAL Id : hal-00702496 , version 1

Citer

Joseph Le Roux, Benoît Sagot, Djamé Seddah. Statistical Parsing of Spanish and Data Driven Lemmatization. ACL 2012 Joint Workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages (SP-Sem-MRL 2012), Jul 2012, Jeju, South Korea. 6 p. ⟨hal-00702496⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-PARIS7 UNIV-PARIS13 CNRS INRIA LIPN INRIA2 CAMPUS-AAR AAI GALILE SORBONNE-UNIVERSITE SU-LETTRES SORBONNE-PARIS-NORD ANR

288 Consultations

242 Téléchargements

Statistical Parsing of Spanish and Data Driven Lemmatization

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager