Skip to Main content Skip to Navigation
Conference papers

Transfer Learning for a Letter-Ngrams to Word Decoder in the Context of Historical Handwriting Recognition with Scarce Resources

Adeline Granet 1, 2 Emmanuel Morin 2 Harold Mouchère 1 Solen Quiniou 2 Christian Viard-Gaudin 1
1 IPI - Image Perception Interaction
LS2N - Laboratoire des Sciences du Numérique de Nantes
2 TALN - Traitement Automatique du Langage Naturel
LS2N - Laboratoire des Sciences du Numérique de Nantes
Abstract : Lack of data can be an issue when beginning a new study on historical handwritten documents. In order to deal with this, we present the character-based decoder part of a multilingual approach based on transductive transfer learning for a historical handwriting recognition task on Italian Comedy Registers. The decoder must build a sequence of characters that corresponds to a word from a vector of letter-ngrams. As learning data, we created a new dataset from untapped resources that covers the same domain and period of our Italian Comedy data, as well as resources from common domains, periods, or languages. We obtain a 97.42% Character Recognition Rate and a 86.57% Word Recognition Rate on our Italian Comedy data, despite a lexical coverage of 67% between the Italian Comedy data and the training data. These results show that an efficient system can be obtained by a carefully selecting the datasets used for the transfer learning.
Complete list of metadatas

Cited literature [24 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01868743
Contributor : Adeline Granet <>
Submitted on : Wednesday, September 5, 2018 - 5:37:31 PM
Last modification on : Friday, June 26, 2020 - 9:05:27 AM
Document(s) archivé(s) le : Thursday, December 6, 2018 - 6:55:02 PM

File

COLING_2018.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01868743, version 1

Citation

Adeline Granet, Emmanuel Morin, Harold Mouchère, Solen Quiniou, Christian Viard-Gaudin. Transfer Learning for a Letter-Ngrams to Word Decoder in the Context of Historical Handwriting Recognition with Scarce Resources. 27th International Conference on Computational Linguistics (COLING), Aug 2018, Santa Fe, NM, United States. pp.1474-1484. ⟨hal-01868743⟩

Share

Metrics

Record views

295

Files downloads

192