Integrated Sequence Tagging for Medieval Latin Using Deep Representation Learning

Mike Kestemont; Jeroen de Gussem

Pré-Publication, Document De Travail Année : 2016

Integrated Sequence Tagging for Medieval Latin Using Deep Representation Learning

(1) , (2)

1
2

Mike Kestemont

Fonction : Auteur

University of Antwerp

Jeroen de Gussem

Fonction : Auteur

Universiteit Gent = Ghent University

Résumé

In this paper we consider two sequence tagging tasks for medieval Latin: part-of-speech tagging and lemmatization. These are both basic, yet foundational preprocessing steps in applications such as text re-use detection. Nevertheless, they are generally complicated by the considerable orthographic variation which is typical of medieval Latin. In Digital Classics, these tasks are traditionally solved in a (i) cascaded and (ii) lexicon-dependent fashion. For example, a lexicon is used to generate all the potential lemma-tag pairs for a token, and next, a context-aware PoS-tagger is used to select the most appropriate tag-lemma pair. Apart from the problems with out-of-lexicon items, error percolation is a major downside of such approaches. In this paper we explore the possibility to elegantly solve these tasks using a single, integrated approach. For this, we make use of a layered neural network architecture from the field of deep representation learning.

Mots clés

PoS-tagging Integrated Sequence Tagging Latin Medieval Latin

Domaines

Linguistique

Fichier principal

pandora_article.pdf (888.82 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Jeroen De Gussem : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01283083

Soumis le : mardi 1 août 2017-09:20:37

Dernière modification le : mardi 14 novembre 2023-11:58:06

Dates et versions

hal-01283083 , version 1 (04-03-2016)

hal-01283083 , version 2 (01-08-2017)

Identifiants

HAL Id : hal-01283083 , version 2

Citer

Mike Kestemont, Jeroen de Gussem. Integrated Sequence Tagging for Medieval Latin Using Deep Representation Learning . 2016. ⟨hal-01283083v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

83 Consultations

315 Téléchargements

Integrated Sequence Tagging for Medieval Latin Using Deep Representation Learning

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Partager