Skip to Main content Skip to Navigation
Conference papers

Structured Multimedia Document Classification

Abstract : We propose a new statistical model for the classification of structured documents and consider its use for multimedia document classification. Its main originality is its ability to simultaneously take into account the structural and the content information present in a structured document, and also to cope with different types of content (text, image, etc). We present experiments on the classification of multilingual pornographic HTML pages using text and image data. The system accurately classifies porn sites from 8 European languages. This corpus has been developed by EADS company in the context of a large Web site filtering application.
Complete list of metadata
Contributor : Ludovic Denoyer <>
Submitted on : Tuesday, August 30, 2016 - 10:17:32 AM
Last modification on : Friday, January 8, 2021 - 5:32:11 PM



Ludovic Denoyer, Jean-Noël Vittaut, Patrick Gallinari, Sylvie Brunessaux, Stephan Brunessaux. Structured Multimedia Document Classification. ACM Document Engeneering, Nov 2003, Grenoble, France. pp.153-160, ⟨10.1145/958220.958249⟩. ⟨hal-01357593⟩



Record views