Learning to Detect Tables in Scanned Document Images using Line Information

This paper presents a method to detect table regions in document images by identifying the column and row line separators and their properties. The method employs a runlength approach to identify the horizontal and vertical lines present in the input image. From each group of intersecting horizontal and vertical lines, a set of 26 low-level features are extracted and an SVM classifier is used to test if it belongs to a table or not. The performance of the method is evaluated on a heterogeneous corpus of French, English and Arabic documents that contain various types of table structures and compared with that of the Tesseract OCR system.

Mots clés

table detection runlength document image analysis

Domaines

Traitement du texte et du document

Fichier principal

kasar_icdar2013.pdf (2.87 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Clément Chatelain : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00905546

Soumis le : mardi 19 novembre 2013-11:33:50

Dernière modification le : vendredi 22 décembre 2023-15:16:05

Archivage à long terme le : jeudi 20 février 2014-03:35:13

Dates et versions

hal-00905546 , version 1 (19-11-2013)

Identifiants

HAL Id : hal-00905546 , version 1

Citer

Thotreingam Kasar, Philippine Barlas, Sebastien Adam, Clément Chatelain, Thierry Paquet. Learning to Detect Tables in Scanned Document Images using Line Information. ICDAR, 2013, France. pp.1185-1189. ⟨hal-00905546⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSA-ROUEN LITIS COMUE-NORMANDIE UNIROUEN UNILEHAVRE INSA-GROUPE

142 Consultations

904 Téléchargements