Convolution, attention and structure embedding, 2019. ,
Neural machine translation by jointly learning to align and translate, International Conference on Learning Representations (ICLR), 2015. ,
Histogram intersesction kernel for image classification, Proceedings 2003 International Conference on Image Processing, p.513, 2003. ,
On the bures-wasserstein distance between positive definite matrices, Expositiones Mathematicae, 2018. ,
Non-Local Means Denoising, Image Processing On Line, vol.1, pp.208-212, 2011. ,
Biological sequence modeling with convolutional kernel networks, Bioinformatics, issue.18, pp.3294-3302, 2019. ,
URL : https://hal.archives-ouvertes.fr/hal-01632912
Recurrent kernel networks, Advances in Neural Information Processing Systems (NeurIPS), 2019. ,
URL : https://hal.archives-ouvertes.fr/hal-02151135
Improving textual network embedding with global attention via optimal transport, Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2019. ,
On the relationship between self-attention and convolutional layers, International Conference on Learning Representations (ICLR), p.2020 ,
Sinkhorn distances: Lightspeed computation of optimal transport, Advances in Neural Information Processing Systems (NeurIPS), 2013. ,
Fast computation of wasserstein barycenters, International Conference on Machine Learning (ICML), 2013. ,
Transformer-xl: Attentive language models beyond a fixed-length context, Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2019. ,
Bert: Pre-training of deep bidirectional transformers for language understanding, Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL), 2019. ,
The pyramid match kernel: Efficient learning with sets of features, Journal of Machine Learning Research, pp.725-760, 2007. ,
Deepsf: deep convolutional neural network for mappingprotein sequences to folds, Bioinformatics, issue.8, pp.1295-1303, 2019. ,
On the burstiness of visual elements, Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), 2009. ,
Using the nyström method to speed up kernel machines, Advances in Neural Information Processing Systems (NeurIPS), 2001. ,
Reformer: The efficient transformer, International Conference on Learning Representations (ICLR, p.2020 ,
Scalable algorithms for string kernels with inexact matching, Advances in Neural Information Processing Systems (NeurIPS), 2009. ,
From word embeddings to document distances, International Conference on Machine Learning (ICML), 2015. ,
The spectrum kernel: a string kernel for svm protein classification, Proceedings of the Pacific Symposium on Biocomputing, pp.564-575, 2002. ,
Mismatch string kernels for discriminative protein classification, Bioinformatics, vol.20, issue.4, pp.467-476, 2004. ,
Mercer kernels for object recognition with local features, Conference on Computer Vision and Pattern Recognition (CVPR), 2004. ,
End-to-end kernel learning with supervised convolutional kernel networks, Advances in Neural Information Processing Systems (NeurIPS), 2016. ,
Cyanure: An open-source toolbox for empirical risk minimization for python, C++, and soon more, 2019. ,
Convolutional kernel networks, Advances in Neural Information Processing Systems (NeurIPS), 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01005489
Are sixteen heads really better than one?, Advances in Neural Information Processing Systems (NeurIPS), 2019. ,
Quantitative stability of optimal transport maps and linearization of the 2-wasserstein space, International Conference on Artificial Intelligence and Statistics (AISTATS), p.2020 ,
Computational optimal transport. Foundations and Trends in Machine Learning, vol.11, pp.355-206, 2019. ,
Exploring the limits of transfer learning with a unified text-to-text transformer, 2019. ,
Fixed encoder self-attention patterns in transformer-based machine translation, 2020. ,
Stand-alone selfattention in vision models, Advances in Neural Information Processing Systems (NeurIPS), 2019. ,
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences, p.622803, 2019. ,
The earth mover's distance as a metric for image retrieval, International Journal of Computer Vision, vol.40, pp.99-121, 2000. ,
Learning with kernels: support vector machines, regularization, optimization, and beyond, 2001. ,
Recursive deep models for semantic compositionality over a sentiment treebank, Conference on Empirical Methods in Natural Language Processing (EMNLP), 2013. ,
Wasserstein weisfeiler-lehman graph kernels, Advances in Neural Information Processing Systems (NeurIPS), 2019. ,
To aggregate or not to aggregate: Selective match kernels for image search, Proceedings of the International Conference on Computer Vision (ICCV), 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00864684
Transformer dissection: A unified understanding of transformer's attention via the lens of kernel, Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019. ,
Attention is all you need, Advances in Neural Information Processing Systems (NeurIPS), 2017. ,
Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned, Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2019. ,
Glue: a multi-task benchmark and analysis platform for natural language understanding, International Conference on Learning Representations (ICLR), 2019. ,
A linear optimal transportation framework for quantifying and visualizing variations in sets of images, International Journal of Computer Vision, vol.101, issue.2, pp.254-269, 2013. ,
Non-local neural networks, Proceedings of the Conference on Computer Vision and Pattern Recognition, 2017. ,
Hard-coded gaussian attention for neural machine translation, Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), p.2020 ,
Huggingface's transformers: State-of-the-art natural language processing, 2019. ,
Predicting effects of noncoding variants with deep learning-based sequence model, Nature methods, vol.12, issue.10, pp.931-934, 2015. ,
On the definiteness of earth mover's distance and its relation to set intersection, IEEE Transactions on Cybernetics, vol.48, issue.11, pp.3184-3196, 2018. ,