Symbol and Spatial Relation Knowledge Extraction Applied to On-Line Handwritten Scripts

Jinpeng Li 1
1 irccyn-ivc
IRCCyN - Institut de Recherche en Communications et en Cybernétique de Nantes
Abstract : Our work concerns knowledge extraction from graphical languages whose symbols are a priori unknown. We are assuming that the observation of a large quantity of documents should allow to discover the symbols of the considered language. The difficulty of the problem is the two-dimensional and handwritten nature of the graphical languages that we are studying. We are considering online handwriting produced by interfaces like touch-screens, interactive whiteboards or electronic pens. The signal is then available as a sampled trajectory of the pen or finger tip, producing a sequence of strokes, themselves composed of a sequence of points. A symbol, the basic element of the alphabet of the language, is composed of a set of strokes with specific structural and relational properties. The extraction of symbols is performed by unveiling the presence of repetitive subgraphs in a global graph modeling the strokes (nodes) and their spatial relationships (arcs) of the entire document set. The principle of minimum description length (MDL) is used to select the best representatives of the symbol set. This work was validated on two experimental datasets. The first one is a dataset of simple mathematical expressions, the second is composed of graphical flowcharts. On these datasets, we can assess the quality of the extracted symbols and compared them to the ground truth. Finally, we were interested in reducing the annotation workload of a database by considering both the problems of segmentation and labeling of the different strokes.
Document type :
Theses
Complete list of metadatas

Cited literature [74 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00785984
Contributor : Harold Mouchère <>
Submitted on : Thursday, February 7, 2013 - 2:16:15 PM
Last modification on : Wednesday, December 19, 2018 - 3:02:08 PM
Long-term archiving on: Wednesday, May 8, 2013 - 3:55:03 AM

Identifiers

  • HAL Id : tel-00785984, version 1

Collections

Citation

Jinpeng Li. Symbol and Spatial Relation Knowledge Extraction Applied to On-Line Handwritten Scripts. Automatic Control Engineering. Université de Nantes Angers Le Mans, 2012. English. ⟨tel-00785984⟩

Share

Metrics

Record views

305

Files downloads

430