Document Classification in a non-stationary environment: A One-Class SVM Approach

Abstract : In this paper, we investigate a specific area of document classification in which the documents come as a flow over the time. Moreover, the exact number of classes of document to deal with is not known from the beginning and could evolve over the time. To be able to perform classification task in such area, we need specific classifiers that are able to perform incremental learning and change their modeling over the time. More specifically, we are focusing our study on SVM approaches, known to perform well, and for which incremental (i-SVM) procedures exist. Nevertheless, most of them are only able to deal with a fixed number of classes. So we designed a new incremental learning procedure based on one-class SVMs. This one is able to improve its classification accuracy over the time, with the arrival of new labeled data, without performing any complete retraining. Moreover, when instances are coming with a previously unknown label (appearance of a new class), the training procedure is able to modify the classifier model to recognize this corresponding new kind of documents. To investigate this area, waiting for collecting documents images as a flow, we did first experiments on the Optical Recognition of Handwritten Digits Data Set. These experiments show that our incremental approach is able: to perform, at each time, as well as a static one-class classifier fully retrained using all previously seen data, to model very quickly and efficiently new incoming classes.
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01027456
Contributor : Denis Maurel <>
Submitted on : Tuesday, July 22, 2014 - 9:23:24 AM
Last modification on : Friday, June 7, 2019 - 11:02:03 AM

Identifiers

  • HAL Id : hal-01027456, version 1

Citation

Ahn Khoi Ngo Ho, Nicolas Ragot, Jean-Yves Ramel, Véronique Eglin, Nicolas Sidère. Document Classification in a non-stationary environment: A One-Class SVM Approach. 12th International Conference on Document Analysis and Recognition, Aug 2013, Washington DC, United States. pp.616-620. ⟨hal-01027456⟩

Share

Metrics

Record views

417