Skip to Main content Skip to Navigation
Conference papers

A Trainable Optimal Transport Embedding for Feature Aggregation and its Relationship to Attention

Grégoire Mialon 1, 2 Dexiong Chen 1 Alexandre d'Aspremont 2 Julien Mairal 1
1 Thoth - Apprentissage de modèles à partir de données massives
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann
2 SIERRA - Statistical Machine Learning and Parsimony
DI-ENS - Département d'informatique - ENS Paris, CNRS - Centre National de la Recherche Scientifique, Inria de Paris
Abstract : We address the problem of learning on sets of features, motivated by the need of performing pooling operations in long biological sequences of varying sizes, with long-range dependencies, and possibly few labeled data. To address this challenging task, we introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference. Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost. Our aggregation technique admits two useful interpretations: it may be seen as a mechanism related to attention layers in neural networks, or it may be seen as a scalable surrogate of a classical optimal transport-based kernel. We experimentally demonstrate the effectiveness of our approach on biological sequences, achieving state-of-the-art results for protein fold recognition and detection of chromatin profiles tasks, and, as a proof of concept, we show promising results for processing natural language sequences. We provide an open-source implementation of our embedding that can be used alone or as a module in larger learning models at
Complete list of metadata
Contributor : Grégoire Mialon Connect in order to contact the contributor
Submitted on : Tuesday, February 9, 2021 - 3:05:23 PM
Last modification on : Tuesday, January 11, 2022 - 11:16:06 AM


Files produced by the author(s)


  • HAL Id : hal-02883436, version 3



Grégoire Mialon, Dexiong Chen, Alexandre d'Aspremont, Julien Mairal. A Trainable Optimal Transport Embedding for Feature Aggregation and its Relationship to Attention. ICLR 2021 - The Ninth International Conference on Learning Representations, May 2021, Virtual, France. ⟨hal-02883436v3⟩



Les métriques sont temporairement indisponibles