Efficient convolution optimisation by composing micro-kernels

Nicolas Tollenaere; Auguste Olivry; Guillaume Iooss; Hugo Brunie; Albert Cohen; P Sadayappan; Fabrice Rastello

Pré-Publication, Document De Travail Année : 2021

Efficient convolution optimisation by composing micro-kernels

(1) , (1) , (1) , (2) , (3) , (4) , (1)

1
2
3
4

Nicolas Tollenaere

Fonction : Auteur

Compiler Optimization and Run-time Systems

Auguste Olivry

Fonction : Auteur

Compiler Optimization and Run-time Systems

Guillaume Iooss

Fonction : Auteur

Compiler Optimization and Run-time Systems

Hugo Brunie

Fonction : Auteur

Lawrence Berkeley National Laboratory [Berkeley]

Albert Cohen

Fonction : Auteur
PersonId : 6894
IdHAL : acohen
ORCID : 0000-0002-8866-5343
IdRef : 067155898

Google France

P Sadayappan

Fonction : Auteur

University of Utah

Fabrice Rastello

Fonction : Auteur
PersonId : 2883
IdHAL : fabrice-rastello
IdRef : 149155727

Compiler Optimization and Run-time Systems

Résumé

Optimizing the implementation of tensor computations is essential to exploiting the full capacity of a given processor architecture on a wide range of scientific and machine learning applications. However, the complexity of the microarchitectural features that come into play when approaching the peak performance of the processor makes it very hard. Focusing on 2D convolutions, we observe a common weakness in all tensor compilers and libraries related to eciently covering the wide variety of problem sizes occurring in real-world applications. We propose TTile, a domain-specific code generator and autotuner for implementing efficient convolutions. Similarly to BLIS [30], TTile nests multiple levels of tiling above a vectorized tensor contraction microkernel. But unlike traditional approaches, we explore of a variety of microkernels and compose them to fit exactly the tensor shapes of a convolution. While this helps achieving consistently high performance on virtually all possible tensor sizes, our method also introduces more degrees of freedom in the optimization space, which makes it challenging for autotuning strategies. To address this, we leverage an analytical model of data movement [22, 25], and combine it with feedback-directed autotuning. We evaluate TTile as a stand-alone compiler and also as a complement to TVM [8] on recent Intel x86 microarchitectures.

Domaines

Performance et fiabilité [cs.PF] Calcul parallèle, distribué et partagé [cs.DC]

Fichier principal

Ttile_HALversion_Oct2021.pdf (795.96 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Guillaume Iooss : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03149553

Soumis le : jeudi 14 octobre 2021-12:06:21

Dernière modification le : jeudi 4 avril 2024-21:22:31

Dates et versions

hal-03149553 , version 1 (23-02-2021)

hal-03149553 , version 2 (09-04-2021)

hal-03149553 , version 3 (14-10-2021)

Identifiants

HAL Id : hal-03149553 , version 3

Citer

Nicolas Tollenaere, Auguste Olivry, Guillaume Iooss, Hugo Brunie, Albert Cohen, et al.. Efficient convolution optimisation by composing micro-kernels. 2021. ⟨hal-03149553v3⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 UGA CNRS INRIA IRISA LIG LIG_SRCPR INRIA2 LIG-SRCPR-CORSE UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM LIG_SIDCH

715 Consultations

1527 Téléchargements

Efficient convolution optimisation by composing micro-kernels

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager