Integrated Sequence Tagging for Medieval Latin Using Deep Representation Learning - Archive ouverte HAL Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2016

Integrated Sequence Tagging for Medieval Latin Using Deep Representation Learning

Résumé

In this paper we consider two sequence tagging tasks for medieval Latin: part-of-speech tagging and lemmatization. These are both basic, yet foundational preprocessing steps in applications such as text re-use detection. Nevertheless, they are generally complicated by the considerable orthographic variation which is typical of medieval Latin. In Digital Classics, these tasks are traditionally solved in a (i) cascaded and (ii) lexicon-dependent fashion. For example, a lexicon is used to generate all the potential lemma-tag pairs for a token, and next, a context-aware PoS-tagger is used to select the most appropriate tag-lemma pair. Apart from the problems with out-of-lexicon items, error percolation is a major downside of such approaches. In this paper we explore the possibility to elegantly solve these tasks using a single, integrated approach. For this, we make use of a layered neural network architecture from the field of deep representation learning.

Domaines

Linguistique
Fichier principal
Vignette du fichier
pandora_article.pdf (888.82 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01283083 , version 1 (04-03-2016)
hal-01283083 , version 2 (01-08-2017)

Identifiants

  • HAL Id : hal-01283083 , version 2

Citer

Mike Kestemont, Jeroen de Gussem. Integrated Sequence Tagging for Medieval Latin Using Deep Representation Learning . 2016. ⟨hal-01283083v2⟩
83 Consultations
315 Téléchargements

Partager

Gmail Facebook X LinkedIn More