Multi One-Class Incremental SVM for Document Stream Digitization

Abstract : Inside the DIGIDOC project (ANR-10-CORD-0020)-CONTenus et INTeractions (CONTINT), our approach was applied to several scenarios of classification of image streams which can cores ond to real cases in digitization projects. Most of the time, the processing of documents is considered as a well-defined task: the classes (also called concepts) are defined and known before the processing starts. But in real industrial workflows of document processes, it may frequently happen that the concepts can change during the time. In a context of document stream processing, the information and content included in the digitized pages can evolve over the time as well as the judgment of the user on what he wants to do with the resulting classification. The goal of this application is to create a module of learning, for a steam-based document images classification (especially dedicated to a digitization process with a huge volume of data), that adapts different situations for intelligent scanning tasks: adding, extending, contracting, splitting, or merging the classes in on an online mode of streaming data processing.
Complete list of metadatas

Cited literature [8 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01307021
Contributor : Nicolas Ragot <>
Submitted on : Tuesday, May 3, 2016 - 1:34:06 PM
Last modification on : Tuesday, July 2, 2019 - 4:02:03 PM
Long-term archiving on : Tuesday, May 24, 2016 - 4:25:31 PM

File

DAS-Ver07.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01307021, version 1

Citation

Anh Khoi Ngo Ho, Véronique Eglin, Nicolas Ragot, Jean-Yves Ramel. Multi One-Class Incremental SVM for Document Stream Digitization. 12th IAPR International Workshop on Document Analysis Systems (DAS 2016), Apr 2016, Santorini, Greece. ⟨hal-01307021⟩

Share

Metrics

Record views

502

Files downloads

297