The First Annotated Corpus of Historical Basque - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Digital Scholarship in the Humanities Année : 2021

The First Annotated Corpus of Historical Basque

Ainara Estarrona
  • Fonction : Auteur
Izaskun Etxeberria
  • Fonction : Auteur
Manuel Padilla-Moyano
  • Fonction : Auteur
Ander Soraluze
  • Fonction : Auteur

Résumé

Abstract This article presents the elaboration of a morphosyntactically annotated diachronic corpus of Basque, and the first results obtained in the processing of historical varieties of this language with computational techniques. The corpus size is around one million words, expanding from the 15th to the mid-18th century and encompassing the most significant written production in all historical dialects. Morphosyntactic tagging allows for systematic searches at different levels of complexity; additionally, a rich set of metadata enables searches based on sociohistorical criteria too. This is not only the first tagged corpus of historical Basque but also a means to improve language processing tools by analyzing historical varieties more or less distant from the present-day standard language. Moreover, this project aims to set a model for further works in the historical corpora of Basque and inform similar projects on other languages.
Fichier principal
Vignette du fichier
First Annotated Historical Corpus of Basque.pdf (568.16 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03505658 , version 1 (31-12-2021)

Identifiants

Citer

Ainara Estarrona, Izaskun Etxeberria, Ricardo Rikardo, R. Etxepare, Manuel Padilla-Moyano, Ander Soraluze. The First Annotated Corpus of Historical Basque. Digital Scholarship in the Humanities, 2021, ⟨10.1093/llc/fqab066⟩. ⟨hal-03505658⟩
71 Consultations
105 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More