MFAS: Multimodal Fusion Architecture Search

Abstract : We tackle the problem of finding good architectures for multimodal classification problems. We propose a novel and generic search space that spans a large number of possible fusion architectures. In order to find an optimal architecture for a given dataset in the proposed search space, we leverage an efficient sequential model-based exploration approach that is tailored for the problem. We demonstrate the value of posing multimodal fusion as a neural architecture search problem by extensive experimentation on a toy dataset and two other real multimodal datasets. We discover fusion architectures that exhibit state-of-the-art performance for problems with different domain and dataset size, including the NTU RGB+D dataset, the largest multi-modal action recognition dataset available.
Complete list of metadatas
Contributor : Valentin Vielzeuf <>
Submitted on : Thursday, March 14, 2019 - 5:43:03 PM
Last modification on : Tuesday, April 2, 2019 - 1:35:13 AM
Long-term archiving on : Saturday, June 15, 2019 - 4:11:15 PM


Files produced by the author(s)


  • HAL Id : hal-02068293, version 1
  • ARXIV : 1903.06496


Juan-Manuel Pérez-Rúa, Valentin Vielzeuf, Stéphane Pateux, Moez Baccouche, Frédéric Jurie. MFAS: Multimodal Fusion Architecture Search. CVPR 2019, Jun 2019, Long Beach, United States. ⟨hal-02068293⟩



Record views


Files downloads