Skip to Main content Skip to Navigation
Conference papers

Modout: Learning to Fuse Face and Gesture Modalities with Stochastic Regularization

Abstract : Model selection methods based on stochastic regularization such as Dropout have been widely used in deep learning due to their simplicity and effectiveness. The standard Dropout method treats all units, visible or hidden, in the same way, thus ignoring any \emph{a priori} information related to grouping or structure. Such structure is present in multi-modal learning applications such as affect analysis and gesture recognition, where subsets of units may correspond to individual modalities. In this paper we describe Modout, a model selection method based on stochastic regularization, which is particularly useful in the multi-modal setting. Different from previous methods, it is capable of learning whether or when to fuse two modalities in a layer, which is usually considered to be an architectural hyper-parameter by deep learning researchers and practitioners. Modout is evaluated on one synthetic and two real multi-modal datasets. The results indicate improved performance compared to other stochastic regularization methods. The result on the Montalbano dataset shows that learning a fusion structure by Modout is on par with a state-of-the-art carefully designed architecture.
Document type :
Conference papers
Complete list of metadata
Contributor : Christian Wolf <>
Submitted on : Tuesday, January 24, 2017 - 11:27:01 AM
Last modification on : Tuesday, June 1, 2021 - 2:08:09 PM


  • HAL Id : hal-01444614, version 1


Fan Li, Natalia Neverova, Christian Wolf, Graham W. Taylor. Modout: Learning to Fuse Face and Gesture Modalities with Stochastic Regularization . International Conference on Automatic Face and Gesture Recognition, May 2017, Washington D.C., United States. ⟨hal-01444614⟩



Record views