A Vectorization and Decision Tree Based Text-Graphics Separation Algorithm for Bangla Maps
Résumé
The present paper proposes a technique for text- graphics separation of geographical maps based on vectorization process and decision tree classification. In the proposed method, every map image is vectorized in order to extract a set of features for characterizing text and graphics. Vectorization provides structural primitives. We associate features to these structural primitives. A decision tree is then designed to discriminate text and graphics in map images, considering the features extracted from the vectorized images. This method provides a binary decision for every vectorized component, classifying the components into graphic or text. The proposed method was tested on a Bangla (a popular Indian regional language) maps dataset composed of a set of grey level images. The proposed text- graphic separation method provides 72.6% and 67.01% character and word-level text extraction accuracy respectively, when tested on map images.