An annotation assistance system using an unsupervised codebook composed of handwritten graphical multi-stroke symbols

Jinpeng Li 1, * Harold Mouchère 1 Christian Viard-Gaudin 1
* Corresponding author
1 irccyn-ivc
IRCCyN - Institut de Recherche en Communications et en Cybernétique de Nantes
Abstract : Many present recognition systems take advantage of ground-truthed datasets for training, evaluating and testing. But the creation of ground-truthed datasets is a tedious task. This paper proposes an iterative unsupervised handwritten graphical symbols learning framework which can be used for assisting such a labeling task. Initializing each stroke as a segment, we construct a relational graph between the segments where the nodes are the segments and the edges are the spatial relations between them. To extract the relevant patterns, a quantization of segments and spatial relations is implemented. Discovering graphical symbols becomes then the problem of finding the sub-graphs according to the Minimum Description Length (MDL) principle. The discovered graphical symbols will become the new segments for the next iteration. In each iteration, the quantization of segments yields the codebook in which the user can label graphical symbols. This original method has been first applied on a dataset of simple mathematical expressions. The results reported in this work show that only 58.2% of the strokes have to be manually labeled.
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-00766693
Contributor : Harold Mouchère <>
Submitted on : Tuesday, December 18, 2012 - 4:27:01 PM
Last modification on : Wednesday, December 19, 2018 - 3:02:08 PM

Identifiers

Collections

Citation

Jinpeng Li, Harold Mouchère, Christian Viard-Gaudin. An annotation assistance system using an unsupervised codebook composed of handwritten graphical multi-stroke symbols. Pattern Recognition Letters, Elsevier, 2014, 35 (1), pp. 46-57. ⟨10.1016/j.patrec.2012.11.018⟩. ⟨hal-00766693⟩

Share

Metrics

Record views

199