Block-o-Matic: a Web Page Segmentation Tool and its Evaluation

Andrés Sanoja 1 Stéphane Gançarski 1
1 BD - Bases de Données
LIP6 - Laboratoire d'Informatique de Paris 6
Abstract : In this paper we present our prototype for the web page segmentation called Block-o-matic and its counterpart Block-o-manual, for manual segmentation. The main idea is to evaluate the correctness of the segmentation algorithm. Build a ground truth database for evaluation can take days or months depending on the collection size, however we address our solution with our manual segmentation tool intended to minimize the time of annotation of blocks in web pages. Both tools implements the same rules for segmentation, for the manual version allows to propose blocks to assessor and for the automatic the block selection. We present our demonstration scenario with a collection of web pages organized in categories. After its annotation they are compared with the automatic segmentation version and it is given a score and a visual comparison.
Type de document :
Poster
29ème journées "Base de données avancées", BDA'13, Oct 2013, Nantes, France. 2013
Domaine :
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-00881693
Contributeur : Andrés Sanoja <>
Soumis le : vendredi 8 novembre 2013 - 17:19:26
Dernière modification le : jeudi 22 novembre 2018 - 14:40:25
Document(s) archivé(s) le : dimanche 9 février 2014 - 09:20:13

Identifiants

  • HAL Id : hal-00881693, version 1

Collections

Citation

Andrés Sanoja, Stéphane Gançarski. Block-o-Matic: a Web Page Segmentation Tool and its Evaluation. 29ème journées "Base de données avancées", BDA'13, Oct 2013, Nantes, France. 2013. 〈hal-00881693〉

Partager

Métriques

Consultations de la notice

712

Téléchargements de fichiers

1658