Skip to Main content Skip to Navigation
Poster communications

Yet Another Hybrid Segmentation Tool

Andrés Sanoja 1 Stéphane Gançarski 1
1 BD - Bases de Données
LIP6 - Laboratoire d'Informatique de Paris 6
Abstract : In this paper1 we present an overview of a prototype we are developing for in the context of web archives (page comparison, crawling and information retrieval). It analyses pages based on their DOM tree information and their visual rendering. This tool implements a modified version of VIPS with the aim of enhancing the precision of visual block extraction and the hierarchy construction. First, the visual rendering of a page, produced by several browsers, is segmented into rectangular blocks. Then, the extracted blocks are analysed looking for visual overlaps, which are analysed using a adapted version of the XY-Cut algorithm and resolve the overlap. As a result we may have different shapes of blocks, rectangular and non-rectangular blocks. Finally, the visual block tree, representing the layout of the page is analysed in order to have a more coherent layout disposition.
Document type :
Poster communications
Complete list of metadata

Cited literature [4 references]  Display  Hide  Download
Contributor : Andrés Sanoja <>
Submitted on : Monday, January 7, 2013 - 9:47:34 AM
Last modification on : Friday, January 8, 2021 - 5:32:09 PM
Long-term archiving on: : Saturday, April 1, 2017 - 12:46:49 AM


  • HAL Id : hal-00770527, version 1


Andrés Sanoja, Stéphane Gançarski. Yet Another Hybrid Segmentation Tool. iPRES 2012 – 9 th International Conference on Preservation of Digital Objects, Oct 2012, Toronto, Canada. 2012. ⟨hal-00770527⟩



Record views


Files downloads