Deep Learning for Saliency Prediction in Natural Video - Archive ouverte HAL Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2016

Deep Learning for Saliency Prediction in Natural Video

Résumé

The purpose of this paper is the detection of salient areas in natural video by using the new deep learning techniques. Salient patches in video frames are predicted first. Then the predicted visual fixation maps are built upon them. We design the deep architecture on the basis of CaffeNet implemented with Caffe toolkit. We show that changing the way of data selection for optimisation of network parameters, we can save computation cost up to $12$ times. We extend deep learning approaches for saliency prediction in still images with RGB values to specificity of video using the sensitivity of the human visual system to residual motion. Furthermore, we complete primary colour pixel values by contrast features proposed in classical visual attention prediction models. The experiments are conducted on two publicly available datasets. The first is IRCCYN video database containing $31$ videos with an overall amount of $7300$ frames and eye fixations of $37$ subjects. The second one is HOLLYWOOD2 provided $2517$ movie clips with the eye fixations of $19$ subjects. On IRCYYN dataset, the accuracy obtained is of $89.51\% $. On HOLLYWOOD2 dataset, results in prediction of saliency of patches show the improvement up to $2\%$ with regard to RGB use only. The resulting accuracy of $76,6\%$ is obtained. The AUC metric in comparison of predicted saliency maps with visual fixation maps shows the increase up to $16\%$ on a sample of video clips from this dataset.
Fichier principal
Vignette du fichier
elsarticle-template_final.pdf (8.32 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01251614 , version 1 (12-01-2016)

Identifiants

  • HAL Id : hal-01251614 , version 1

Citer

Souad Chaabouni, Jenny Benois-Pineau, Ofer Hadar, Chokri Ben Amar. Deep Learning for Saliency Prediction in Natural Video. 2016. ⟨hal-01251614⟩

Collections

CNRS
398 Consultations
1282 Téléchargements

Partager

Gmail Facebook X LinkedIn More