Rodinia: A benchmark suite for heterogeneous computing, 2009 IEEE International Symposium on Workload Characterization (IISWC), pp.44-54, 2009. ,
DOI : 10.1109/IISWC.2009.5306797
Barra: A Parallel Functional Simulator for GPGPU, 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, pp.351-360, 2010. ,
DOI : 10.1109/MASCOTS.2010.43
Dynamic Detection of Uniform and Affine Vectors in GPGPU Computations, Europar 3rd Workshop on Highly Parallel Processing on a Chip (HPPC), volume LNCS 6043, pp.46-55, 2009. ,
DOI : 10.1007/978-3-642-14122-5_8
URL : https://hal.archives-ouvertes.fr/hal-00396719
Affine Vector Cache for memory bandwidth savings, 2011. ,
URL : https://hal.archives-ouvertes.fr/ensl-00649200
Control Flow Optimization Via Dynamic Reconvergence Prediction, 37th International Symposium on Microarchitecture (MICRO-37'04), pp.129-140, 2004. ,
DOI : 10.1109/MICRO.2004.13
Tracking register usage during multithreaded processing using a scoreboard having separate memory regions and storing sequential register size indicators, US Patent, vol.7434032, 2008. ,
PEPSC: A Power-Efficient Processor for Scientific Computing, 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011. ,
DOI : 10.1109/PACT.2011.16
Multithreaded instruction sharing, 2010. ,
SIMD re-convergence at thread frontiers, Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-44 '11, 2011. ,
DOI : 10.1145/2155620.2155676
CASH: Revisiting hardware sharing in single-chip parallel processor, Journal of Instruction-Level Parallelism, vol.6, pp.1-16, 2004. ,
URL : https://hal.archives-ouvertes.fr/inria-00071925
Multithreaded vector architectures, Proceedings Third International Symposium on High-Performance Computer Architecture, pp.237-244, 1997. ,
DOI : 10.1109/HPCA.1997.569677
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.5271
Towards solving the table maker dilemma on GPU, 20th Euromicro International Conference on Parallel, Distributed and Network-Based Computing, p.2012 ,
URL : https://hal.archives-ouvertes.fr/hal-00642337
Thread block compaction for efficient SIMT control flow, 2011 IEEE 17th International Symposium on High Performance Computer Architecture, pp.25-36, 2011. ,
DOI : 10.1109/HPCA.2011.5749714
Dynamic warp formation, ACM Transactions on Architecture and Code Optimization, vol.6, issue.2, pp.1-7, 2009. ,
DOI : 10.1145/1543753.1543756
Energyefficient mechanisms for managing thread context in throughput processors, Proceeding of the 38th annual international symposium on Computer architecture, pp.235-246, 2011. ,
Coherent vector lane threading, 2009. ,
The vector-thread architecture, IEEE Micro, vol.24, issue.6, pp.84-90, 2004. ,
DOI : 10.1109/MM.2004.90
Conjoinedcore chip multiprocessing, IEEE/ACM International Symposium on Microarchitecture, pp.195-206, 2004. ,
DOI : 10.1109/micro.2004.12
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.111.7776
Performance in GPU architectures: Potentials and distances, 9th Annual Workshop on Duplicating, Deconstructing, and Debunking (WDDD11), in conjunction with ISCA-38, 2011. ,
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU, ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture, pp.451-460, 2010. ,
NVIDIA Tesla: A Unified Graphics and Computing Architecture, IEEE Micro, vol.28, issue.2, pp.39-55, 2008. ,
DOI : 10.1109/MM.2008.31
Minimal Multi-threading: Finding and Removing Redundant Instructions in Multi-threaded Processors, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, pp.337-348, 2010. ,
DOI : 10.1109/MICRO.2010.41
Dynamic warp subdivision for integrated branch and memory divergence tolerance, ACM SIGARCH Computer Architecture News, vol.38, issue.3, pp.235-246, 2010. ,
DOI : 10.1145/1816038.1815992
Scheduler in multi-threaded processor prioritizing instructions passing qualification rule, 2011. ,
Improving GPU performance via large warps and two-level warp scheduling, Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-44 '11, 2011. ,
DOI : 10.1145/2155620.2155656
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.357.9840
The GPU Computing Era, IEEE Micro, vol.30, issue.2, pp.56-69, 2010. ,
DOI : 10.1109/MM.2010.41
Content-Addressable Memory (CAM) Circuits and Architectures: A Tutorial and Survey, IEEE Journal of Solid-State Circuits, vol.41, issue.3, pp.712-727, 2006. ,
DOI : 10.1109/JSSC.2005.864128
Vector Lane Threading, 2006 International Conference on Parallel Processing (ICPP'06), pp.55-64, 2006. ,
DOI : 10.1109/ICPP.2006.74
Simultaneous multithreading, ACM SIGARCH Computer Architecture News, vol.23, issue.2, pp.392-403, 1995. ,
DOI : 10.1145/225830.224449