Machine Learning for Semi-Structured Multimedia Documents : Application to pornographic filtering and thematic categorization

Ludovic Denoyer 1 Patrick Gallinari 1
1 MALIRE - Machine Learning and Information Retrieval
LIP6 - Laboratoire d'Informatique de Paris 6
Abstract : We propose a generative statistical model for the classification of semi-structured multimedia documents. Its main originality is its ability to simultaneously take into account the structural and the content information present in a semi-structured document and also to cope with different types of content (text, image, etc.). We then present the results obtained on two sets of experiments: • One set concerns the filtering of pornographic Web pages • The second one concerns the thematic classification of Wikipedia documents.
Document type :
Book sections
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01305052
Contributor : Lip6 Publications <>
Submitted on : Wednesday, April 20, 2016 - 4:14:10 PM
Last modification on : Thursday, March 21, 2019 - 1:19:37 PM

Identifiers

Citation

Ludovic Denoyer, Patrick Gallinari. Machine Learning for Semi-Structured Multimedia Documents : Application to pornographic filtering and thematic categorization. Machine Learning Techniques for Multimedia Content, Springer, pp.227-247, 2008, Cognitive Technologies, 978-3-540-75170-0. ⟨10.1007/978-3-540-75171-7_10⟩. ⟨hal-01305052⟩

Share

Metrics

Record views

73