Temporal Multimodal Fusion for Video Emotion Classification in the Wild

Valentin Vielzeuf; Stéphane Pateux; Frédéric Jurie

Communication Dans Un Congrès Année : 2017

Temporal Multimodal Fusion for Video Emotion Classification in the Wild

(1, 2) , (1) , (2)

1
2

Valentin Vielzeuf

Fonction : Auteur
PersonId : 1016921

Orange Labs R&D [Rennes]

Equipe Image - Laboratoire GREYC - UMR6072

Stéphane Pateux

Fonction : Auteur
PersonId : 1016922

Orange Labs R&D [Rennes]

Frédéric Jurie

Fonction : Auteur
PersonId : 3233
IdHAL : frederic-jurie
ORCID : 0000-0002-2686-0020
IdRef : 080485022

Equipe Image - Laboratoire GREYC - UMR6072

Résumé

This paper addresses the question of emotion classification. The task consists in predicting emotion labels (taken among a set of possible labels) best describing the emotions contained in short video clips. Building on a standard framework – lying in describing videos by audio and visual features used by a supervised classifier to infer the labels – this paper investigates several novel directions. First of all, improved face descriptors based on 2D and 3D Convo-lutional Neural Networks are proposed. Second, the paper explores several fusion methods, temporal and multimodal, including a novel hierarchical method combining features and scores. In addition, we carefully reviewed the different stages of the pipeline and designed a CNN architecture adapted to the task; this is important as the size of the training set is small compared to the difficulty of the problem, making generalization difficult. The so-obtained model ranked 4th at the 2017 Emotion in the Wild challenge with the accuracy of 58.8 %.

Mots clés

Deep Learning Emotion Recognition Multimodal Fusion Recurrent Neural Net- work

Domaines

Vision par ordinateur et reconnaissance de formes [cs.CV] Multimédia [cs.MM] Apprentissage [cs.LG]

Fichier principal

main.pdf (1.76 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Valentin Vielzeuf : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01590608

Soumis le : mercredi 20 septembre 2017-16:43:48

Dernière modification le : mercredi 20 mars 2024-16:20:04

Dates et versions

hal-01590608 , version 1 (20-09-2017)

Identifiants

HAL Id : hal-01590608 , version 1
ARXIV : 1709.07200

Citer

Valentin Vielzeuf, Stéphane Pateux, Frédéric Jurie. Temporal Multimodal Fusion for Video Emotion Classification in the Wild. ACM - ICMI 2017, Nov 2017, Glasgow, United Kingdom. ⟨hal-01590608⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS GREYC GREYC-IMAGE COMUE-NORMANDIE ENSICAEN UNICAEN

247 Consultations

493 Téléchargements

Temporal Multimodal Fusion for Video Emotion Classification in the Wild

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager