déposer
version française rss feed
HAL : ensl-00649650, version 2

Fiche détaillée  Récupérer au format
39th Annual International Symposium on Computer Architecture (ISCA), Portland, OR : États-Unis
Versions disponibles :
Simultaneous Branch and Warp Interweaving for Sustained GPU Performance
Nicolas Brunie ( ) 1, 2, 3, Sylvain Collange ( ) 4, Gregory Diamos 5
(11/06/2012)

Single-Instruction Multiple-Thread (SIMT) micro-architectures implemented in Graphics Processing Units (GPUs) run fine-grained threads in lockstep by grouping them into units, referred to as warps, to amortize the cost of instruction fetch, decode and control logic over multiple execution units. As individual threads take divergent execution paths, their processing takes place sequentially, defeating part of the efficiency advantage of SIMD execution. We present two complementary techniques that mitigate the impact of thread divergence on SIMT micro-architectures. Both techniques relax the SIMD execution model by allowing two distinct instructions to be scheduled to disjoint subsets of the the same row of execution units, instead of one single instruction. They increase flexibility by providing more thread grouping opportunities than SIMD, while preserving the affinity between threads to avoid introducing extra memory divergence. We consider (1) co-issuing instructions from different divergent paths of the same warp and (2) co-issuing instructions from different warps. To support (1), we introduce a novel thread reconvergence technique that ensures threads are run back in lockstep at control-flow reconvergence points without hindering their ability to run branches in parallel. We propose a lane shuffling technique to allow solution (2) to benefit from inter-warp correlations in divergence patterns. The combination of all these techniques improves performance by 23% on a set of regular GPGPU applications and by 40% on irregular applications, while maintaining the same instruction-fetch and processing-unit resource requirements as the contemporary Fermi GPU architecture.
1 :  Kalray
Kalray
2 :  Laboratoire de l'Informatique du Parallélisme (LIP)
Université de Lyon – CNRS : UMR5668 – INRIA – École Normale Supérieure - Lyon – Université Claude Bernard - Lyon I
3 :  ARIC (Inria Grenoble Rhône-Alpes / LIP Laboratoire de l'Informatique du Parallélisme)
INRIA – CNRS : UMR5668 – Université Claude Bernard - Lyon I – École Normale Supérieure - Lyon
4 :  Departamento de Ciência da Computação [Minas Gerais] (DCC - UFMG)
Universidade Federal de Minas Gerais
5 :  NVIDIA (NVIDIA)
NVIDIA Corp.
ARENAIRE - Arithmétique des ordinateurs
Informatique/Architecture
Liste des fichiers attachés à ce document : 
PDF
sbiswi.pdf(565.6 KB)

tous les articles de la base du CCSd...
tous les articles de la base du CCSd...
tous les articles de la base du CCSd...
tous les articles de la base du CCSd...
tous les articles de la base du CCSd...
tous les articles de la base du CCSd...
tous les articles de la base du CCSd...
tous les articles de la base du CCSd...
tous les articles de la base du CCSd...
tous les articles de la base du CCSd...
tous les articles de la base du CCSd...
tous les articles de la base du CCSd...
tous les articles de la base du CCSd...
tous les articles de la base du CCSd...
tous les articles de la base du CCSd...
tous les articles de la base du CCSd...
tous les articles de la base du CCSd...
tous les articles de la base du CCSd...
tous les articles de la base du CCSd...
tous les articles de la base du CCSd...
tous les articles de la base du CCSd...
tous les articles de la base du CCSd...
tous les articles de la base du CCSd...