Block-o-Matic: a Web Page Segmentation Tool and its Evaluation

Andrés Sanoja 1 Stéphane Gançarski 1
1 BD - Bases de Données
LIP6 - Laboratoire d'Informatique de Paris 6
Abstract : In this paper we present our prototype for the web page segmentation called Block-o-matic and its counterpart Block-o-manual, for manual segmentation. The main idea is to evaluate the correctness of the segmentation algorithm. Build a ground truth database for evaluation can take days or months depending on the collection size, however we address our solution with our manual segmentation tool intended to minimize the time of annotation of blocks in web pages. Both tools implements the same rules for segmentation, for the manual version allows to propose blocks to assessor and for the automatic the block selection. We present our demonstration scenario with a collection of web pages organized in categories. After its annotation they are compared with the automatic segmentation version and it is given a score and a visual comparison.
Document type :
Poster communications
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-00881693
Contributor : Andrés Sanoja <>
Submitted on : Friday, November 8, 2013 - 5:19:26 PM
Last modification on : Thursday, March 21, 2019 - 1:00:51 PM
Long-term archiving on: Sunday, February 9, 2014 - 9:20:13 AM

Identifiers

  • HAL Id : hal-00881693, version 1

Citation

Andrés Sanoja, Stéphane Gançarski. Block-o-Matic: a Web Page Segmentation Tool and its Evaluation. 29ème journées "Base de données avancées", BDA'13, Oct 2013, Nantes, France. 2013. ⟨hal-00881693⟩

Share

Metrics

Record views

794

Files downloads

1984