Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

Efficient convolution optimisation by composing micro-kernels

Abstract : Optimizing the implementation of tensor computations is essential to exploiting the full capacity of a given processor architecture on a wide range of scientific and machine learning applications. However, the complexity of the microarchitectural features that come into play when approaching the peak performance of the processor makes it very hard. Focusing on 2D convolutions, we observe a common weakness in all tensor compilers and libraries related to eciently covering the wide variety of problem sizes occurring in real-world applications. We propose TTile, a domain-specific code generator and autotuner for implementing efficient convolutions. Similarly to BLIS [30], TTile nests multiple levels of tiling above a vectorized tensor contraction microkernel. But unlike traditional approaches, we explore of a variety of microkernels and compose them to fit exactly the tensor shapes of a convolution. While this helps achieving consistently high performance on virtually all possible tensor sizes, our method also introduces more degrees of freedom in the optimization space, which makes it challenging for autotuning strategies. To address this, we leverage an analytical model of data movement [22, 25], and combine it with feedback-directed autotuning. We evaluate TTile as a stand-alone compiler and also as a complement to TVM [8] on recent Intel x86 microarchitectures.
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03149553
Contributor : Guillaume Iooss Connect in order to contact the contributor
Submitted on : Thursday, October 14, 2021 - 12:06:21 PM
Last modification on : Tuesday, August 2, 2022 - 4:24:43 AM

File

Ttile_HALversion_Oct2021.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-03149553, version 3

Citation

Nicolas Tollenaere, Auguste Olivry, Guillaume Iooss, Hugo Brunie, Albert Cohen, et al.. Efficient convolution optimisation by composing micro-kernels. 2021. ⟨hal-03149553v3⟩

Share

Metrics

Record views

540

Files downloads

828