BERTrade: Using Contextual Embeddings to Parse Old French

The successes of contextual word embeddings learned by training large-scale language models, while remarkable, have mostly occurred for languages where significant amounts of raw texts are available and where annotated data in downstream tasks have a relatively regular spelling. Conversely, it is not yet completely clear if these models are also well suited for lesser-resourced and more irregular languages. We study the case of Old French, which is in the interesting position of having relatively limited amount of available raw text, but enough annotated resources to assess the relevance of contextual word embedding models for downstream NLP tasks. In particular, we use POS-tagging and dependency parsing to evaluate the quality of such models in a large array of configurations, including models trained from scratch from small amounts of raw text and models pre-trained on other languages but fine-tuned on Medieval French data.

Mots clés

Old French Contextual word embeddings Dependency Parsing Part of Speech Tagging

Domaines

Informatique et langage [cs.CL]

Fichier principal

Bertrade_LREC_2022.pdf (536.81 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Mathilde Regnault : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03736840

Soumis le : vendredi 22 juillet 2022-19:03:11

Dernière modification le : vendredi 19 avril 2024-16:18:57

Archivage à long terme le : dimanche 23 octobre 2022-19:07:56

Dates et versions

hal-03736840 , version 1 (22-07-2022)

Licence

Paternité - Pas d'utilisation commerciale

Identifiants

HAL Id : hal-03736840 , version 1

Citer

Loïc Grobol, Mathilde Regnault, Pedro Ortiz Suarez, Benoît Sagot, Laurent Romary, et al.. BERTrade: Using Contextual Embeddings to Parse Old French. 13th Language Resources and Evaluation Conference, European Language Resources Association, Jun 2022, Marseille, France. ⟨hal-03736840⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS CNRS INRIA UNIV-ORLEANS UNIV-PARIS3 LATTICE MODYCO LLF INRIA2 GENCI CAMPUS-AAR AAI PSL INSA-GROUPE SORBONNE-UNIVERSITE INSA-CVL UNIV-PARIS-LUMIERES UP-SOCIETES-HUMANITES ANR UNIV-PARIS-NANTERRE

172 Consultations

121 Téléchargements