A Platform for Storing, Visualizing, and Interpreting Collections of Noisy Documents - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2010

A Platform for Storing, Visualizing, and Interpreting Collections of Noisy Documents

Résumé

The goal of document image analysis is to produce interpretations that match those of a fluent and knowledgeable human when viewing the same input. Because computer vision techniques are not perfect, the text that results when processing scanned pages is frequently noisy. Building on previous work, we propose a new paradigm for handling the inevitable incomplete, partial, erroneous, or slightly orthogonal interpretations that commonly arise in document datasets. Starting from the observation that interpretations are dependent on application context or user viewpoint, we describe a platform now under development that is capable of managing multiple interpretations for a document and offers an unprecedented level of interaction so that users can freely build upon, extend, or correct existing interpretations. In this way, the system supports the creation of a continuously expanding and improving document analysis repository which can be used to support research in the field.
Fichier principal
Vignette du fichier
and16-lamiroy-HAL.pdf (643.66 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

inria-00516678 , version 1 (10-09-2010)

Identifiants

Citer

Bart Lamiroy, Daniel Lopresti. A Platform for Storing, Visualizing, and Interpreting Collections of Noisy Documents. Fourth Workshop on Analytics for Noisy Unstructured Text Data - AND'10, IAPR, Oct 2010, Toronto, Canada. ⟨10.1145/1871840.1871844⟩. ⟨inria-00516678⟩
84 Consultations
188 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More