Toward an audiovisual attention model for multimodal video content - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Neurocomputing Année : 2017

Toward an audiovisual attention model for multimodal video content

Résumé

Visual attention modeling is a very active research field and several image and video attention models have been proposed during the last decade. However, despite the conclusions drawn from various studies about the influence of human gazes by the presence of sound, most of the classical video attention models do not account for the multimodal nature of video (visual and auditory cues). In this paper, we propose an audiovisual saliency model with the aim to predict human gaze maps when exploring video content. The model, intended for videoconferencing, is based on the fusion of spatial, temporal and auditory attentional maps. Based on a real-time audiovisual speaker localization approach, the proposed auditory map is modulated depending of the nature of faces in the video, i.e. speaker or auditor. State-of-the-art performance measures have been used to compare the predicted saliency maps with the eye-tracking ground truth. The obtained results show the very good performance of the proposed model and a significant improvement compared to non-audio models.
Fichier non déposé

Dates et versions

hal-01355968 , version 1 (24-08-2016)

Identifiants

Citer

Naty Sidaty, Mohamed-Chaker Larabi, Abdelhakim Saadane. Toward an audiovisual attention model for multimodal video content. Neurocomputing, 2017, 259, pp.94 - 111. ⟨10.1016/j.neucom.2016.08.130⟩. ⟨hal-01355968⟩
170 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More