Skip to Main content Skip to Navigation
Conference papers

Structural and Visual Similarity Learning for Web Page Archiving

Marc Teva Law 1 Carlos Sureda Gutierrez 1 Nicolas Thome 1 Stéphane Gançarski 2 Matthieu Cord 1
1 MALIRE - Machine Learning and Information Retrieval
LIP6 - Laboratoire d'Informatique de Paris 6
2 BD - Bases de Données
LIP6 - Laboratoire d'Informatique de Paris 6
Abstract : We present in this paper a Web page archiving approach combining image and structural techniques. Our main goal is to learn a similarity between Web pages in order to detect whether successive versions of pages are similar or not. Our system is based on a visual similarity measure designed for Web pages. Combined with a structural analysis of Web page source codes, a supervised feature selection method adapted to Web archiving is proposed. Experiments on real Web archives are reported including scalability issues.
Document type :
Conference papers
Complete list of metadata
Contributor : Lip6 Publications Connect in order to contact the contributor
Submitted on : Tuesday, February 9, 2016 - 3:52:51 PM
Last modification on : Friday, January 8, 2021 - 5:34:11 PM

Links full text



Marc Teva Law, Carlos Sureda Gutierrez, Nicolas Thome, Stéphane Gançarski, Matthieu Cord. Structural and Visual Similarity Learning for Web Page Archiving. 10th workshop on Content-Based Multimedia Indexing (CBMI), Jun 2012, Annecy, France. pp.1-6, ⟨10.1109/CBMI.2012.6269849⟩. ⟨hal-01271765⟩



Record views