Transfer Learning for a Letter-Ngrams to Word Decoder in the Context of Historical Handwriting Recognition with Scarce Resources

Adeline Granet 1, 2 Emmanuel Morin 2 Harold Mouchère 1 Solen Quiniou 2 Christian Viard-Gaudin 1
1 IPI - Image Perception Interaction
LS2N - Laboratoire des Sciences du Numérique de Nantes
2 TALN - Traitement Automatique du Langage Naturel
LS2N - Laboratoire des Sciences du Numérique de Nantes
Abstract : Lack of data can be an issue when beginning a new study on historical handwritten documents. In order to deal with this, we present the character-based decoder part of a multilingual approach based on transductive transfer learning for a historical handwriting recognition task on Italian Comedy Registers. The decoder must build a sequence of characters that corresponds to a word from a vector of letter-ngrams. As learning data, we created a new dataset from untapped resources that covers the same domain and period of our Italian Comedy data, as well as resources from common domains, periods, or languages. We obtain a 97.42% Character Recognition Rate and a 86.57% Word Recognition Rate on our Italian Comedy data, despite a lexical coverage of 67% between the Italian Comedy data and the training data. These results show that an efficient system can be obtained by a carefully selecting the datasets used for the transfer learning.
Type de document :
Communication dans un congrès
27th International Conference on Computational Linguistics (COLING 2018), Aug 2018, Santa Fe, NM, United States. Proceedings of the 27th International Conference on Computational Linguistics, COLING 2018, Santa Fe, New Mexico, USA, August 20-26, 2018, pp.1474-1484, 〈http://coling2018.org/〉
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01868743
Contributeur : Adeline Granet <>
Soumis le : mercredi 5 septembre 2018 - 17:37:31
Dernière modification le : lundi 17 septembre 2018 - 11:49:23

Fichier

COLING_2018.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01868743, version 1

Collections

Citation

Adeline Granet, Emmanuel Morin, Harold Mouchère, Solen Quiniou, Christian Viard-Gaudin. Transfer Learning for a Letter-Ngrams to Word Decoder in the Context of Historical Handwriting Recognition with Scarce Resources. 27th International Conference on Computational Linguistics (COLING 2018), Aug 2018, Santa Fe, NM, United States. Proceedings of the 27th International Conference on Computational Linguistics, COLING 2018, Santa Fe, New Mexico, USA, August 20-26, 2018, pp.1474-1484, 〈http://coling2018.org/〉. 〈hal-01868743〉

Partager

Métriques

Consultations de la notice

77

Téléchargements de fichiers

12