Détection d'erreurs dans des transcriptions OCR de documents historiques par réseaux de neurones récurrents multi-niveau

Abstract : Combining character level and word level RNNs for post-OCR error detection Post-OCR processing, consist in detecting errors first, then correcting them when possible. In this context the ICDAR-2017 Competition on Post-OCR Text Correction was organized to compare approaches on these two tasks. This paper presents an OCR error detection system based on a 2-pass RNN model combining character level and word level representations. This system was ranked 2nd on three datasets among 11 participants at the ICDAR-2017 Competition.
Complete list of metadatas

Cited literature [15 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01905258
Contributor : Thibault Magallon <>
Submitted on : Thursday, October 25, 2018 - 4:24:40 PM
Last modification on : Tuesday, December 18, 2018 - 8:04:08 AM
Long-term archiving on : Saturday, January 26, 2019 - 3:26:37 PM

File

TALN_VFinal.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01905258, version 1

Collections

Citation

Thibault Magallon, Frédéric Béchet, Benoit Favre. Détection d'erreurs dans des transcriptions OCR de documents historiques par réseaux de neurones récurrents multi-niveau. 25e conférence sur le Traitement Automatique des Langues Naturelles (TALN), May 2018, Rennes, France. ⟨hal-01905258⟩

Share

Metrics

Record views

64

Files downloads

207