Paragraph text segmentation into lines with Recurrent Neural Networks - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2015

Paragraph text segmentation into lines with Recurrent Neural Networks

Christopher Kermorvant
Jérôme Louradour
  • Fonction : Auteur

Résumé

The detection of text lines, as a first processing step, is critical in all Text Recognition systems. State-of-the-art methods to locate lines of text are based on handcrafted heuristics fine-tuned by the Image Processing Community's experience. They succeed under certain constraints; for instance the background has to be roughly uniform. We propose to use more ``agnostic'' Machine Learning-based approaches to address text line location. The main motivation is to be able to process either damaged documents, or flows of documents with a high variety of layouts and other characteristics. A new method is presented in this work, inspired by the latest generation of optical models used for Text Recognition, namely Recurrent Neural Networks. As these models are sequential, a column of text lines in our application plays here the same role as a line of characters in more traditional text recognition settings. A key advantage of the proposed method over other data-driven approaches is that compiling a training dataset does not require labeling line boundaries: only the number of lines are required for each paragraph. Experimental results show that our approach gives similar or better results than traditional handcrafted approaches, with little engineering efforts and less hyper-parameter tuning.
Fichier non déposé

Dates et versions

hal-01151760 , version 1 (13-05-2015)

Identifiants

  • HAL Id : hal-01151760 , version 1

Citer

Bastien Moysset, Christopher Kermorvant, Christian Wolf, Jérôme Louradour. Paragraph text segmentation into lines with Recurrent Neural Networks. International Conference on Document Analysis and Recognition, Aug 2015, Tunisia, Tunisia. ⟨hal-01151760⟩
193 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More