Structured Multimedia Document Classification

Abstract : We propose a new statistical model for the classification of structured documents and consider its use for multimedia document classification. Its main originality is its ability to simultaneously take into account the structural and the content information present in a structured document, and also to cope with different types of content (text, image, etc). We present experiments on the classification of multilingual pornographic HTML pages using text and image data. The system accurately classifies porn sites from 8 European languages. This corpus has been developed by EADS company in the context of a large Web site filtering application.
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01357593
Contributor : Ludovic Denoyer <>
Submitted on : Tuesday, August 30, 2016 - 10:17:32 AM
Last modification on : Thursday, March 21, 2019 - 2:18:55 PM

Identifiers

Citation

Ludovic Denoyer, Jean-Noël Vittaut, Patrick Gallinari, Sylvie Brunessaux, Stephan Brunessaux. Structured Multimedia Document Classification. ACM Document Engeneering, Nov 2003, Grenoble, France. pp.153-160, ⟨10.1145/958220.958249⟩. ⟨hal-01357593⟩

Share

Metrics

Record views

136