Dynamic interthread vectorization architecture: extracting DLP from TLP, International Symposium on Computer Architecture and High-Performance Computing (SBAC-PAD) ,
DOI : 10.1109/sbac-pad.2016.11
URL : https://hal.archives-ouvertes.fr/hal-01356202
Simultaneous multithreading: Maximizing on-chip parallelism, Proceedings of the 22nd Annual International Symposium on Computer Architecture, ISCA '95, pp.392-403, 1995. ,
Exploiting choice, ACM SIGARCH Computer Architecture News, vol.24, issue.2, pp.191-202, 1996. ,
DOI : 10.1145/232974.232993
The PARSEC benchmark suite, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, PACT '08, pp.72-81, 2008. ,
DOI : 10.1145/1454115.1454128
Rodinia: A benchmark suite for heterogeneous computing, 2009 IEEE International Symposium on Workload Characterization (IISWC), pp.44-54, 2009. ,
DOI : 10.1109/IISWC.2009.5306797
Design tradeoffs for the alpha EV8 conditional branch predictor, 29th International Symposium on Computer Architecture, pp.25-29, 2002. ,
Branch prediction and simultaneous multithreading, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique, pp.169-173552664, 1996. ,
DOI : 10.1109/PACT.1996.552664
URL : https://hal.archives-ouvertes.fr/inria-00073847
Out-of-order execution may not be cost-effective on processors featuring simultaneous multithreading, Proceedings Fifth International Symposium on High-Performance Computer Architecture, pp.64-67744331, 1999. ,
DOI : 10.1109/HPCA.1999.744331
URL : https://hal.archives-ouvertes.fr/inria-00073298
Thread scheduling and memory coalescing for dynamic vectorization of SPMD workloads, Parallel Computing, vol.40, issue.9, pp.548-558, 2014. ,
DOI : 10.1016/j.parco.2014.03.006
URL : https://hal.archives-ouvertes.fr/hal-01087054
The CRAY-1 computer system, Communications of the ACM, vol.21, issue.1, pp.63-72, 1978. ,
DOI : 10.1145/359327.359336
Whole-function vectorization, Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization, pp.141-150, 2011. ,
DOI : 10.1109/cgo.2011.5764682
URL : http://www.intel-vci.uni-saarland.de/uploads/tx_sibibtex/10.pdf
Exploring the Design Space of SPMD Divergence Management on Data-Parallel Architectures, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp.101-113, 2014. ,
DOI : 10.1109/MICRO.2014.48
Asanovic, A 45nm 1.3 ghz 16.7 double-precision GFLOPS/W RISC-V processor with vector accelerators, European Solid State Circuits Conference, pp.2014-2054, 2014. ,
The GPU Computing Era, IEEE Micro, vol.30, issue.2, pp.56-69, 2010. ,
DOI : 10.1109/MM.2010.41
SIMD re-convergence at thread frontiers, Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-44 '11, 2011. ,
DOI : 10.1145/2155620.2155676
iGPU, ACM SIGARCH Computer Architecture News, vol.40, issue.3, pp.72-83, 2012. ,
DOI : 10.1145/2366231.2337168
Dynamic warp formation, ACM Transactions on Architecture and Code Optimization, vol.6, issue.2, pp.1-7, 2009. ,
DOI : 10.1145/1543753.1543756
Simultaneous branch and warp interweaving for sustained GPU performance, ACM SIGARCH Computer Architecture News, vol.40, issue.3, pp.49-60, 2012. ,
DOI : 10.1145/2366231.2337166
URL : https://hal.archives-ouvertes.fr/ensl-00649650
HARP, ACM Transactions on Embedded Computing Systems, vol.13, issue.3s, pp.13-16, 2014. ,
DOI : 10.1007/s02011-011-1137-8
Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor, Proceedings of the 23rd Annual International Symposium on Computer Architecture, pp.191-202, 1996. ,
Dynamically Controlled Resource Allocation in SMT Processors, 37th International Symposium on Microarchitecture (MICRO-37'04), pp.171-18217, 2004. ,
DOI : 10.1109/MICRO.2004.17
Front-end policies for improved issue efficiency in SMT processors, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings., pp.31-401183522, 2003. ,
DOI : 10.1109/HPCA.2003.1183522
A memory-level parallelism aware fetch policy for SMT processors, 13st International Conference on High-Performance Computer Architecture, pp.240-249, 2007. ,
Compiling C* programs for a hypercube multicomputer, ACM SIGPLAN Notices, vol.23, issue.9, pp.57-65, 1988. ,
DOI : 10.1145/62116.62122
Stack-less simt reconvergence at low cost, Tech. rep., HAL, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00622654
Conjoined-Core Chip Multiprocessing, 37th International Symposium on Microarchitecture (MICRO-37'04), pp.195-206, 2004. ,
DOI : 10.1109/MICRO.2004.12
Thread fusion, Proceeding of the thirteenth international symposium on Low power electronics and design, ISLPED '08, pp.363-368, 2008. ,
DOI : 10.1145/1393921.1394018
Minimal Multi-threading: Finding and Removing Redundant Instructions in Multi-threaded Processors, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, pp.337-348, 2010. ,
DOI : 10.1109/MICRO.2010.41
Multithreaded instruction sharing ,
Execution Drafting: Energy Efficiency through Computation Deduplication, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp.432-444, 2014. ,
DOI : 10.1109/MICRO.2014.43
Some Computer Organizations and Their Effectiveness, IEEE Transactions on Computers, vol.21, issue.9, pp.948-960, 1972. ,
DOI : 10.1109/TC.1972.5009071
Dynamic warp subdivision for integrated branch and memory divergence tolerance, ACM SIGARCH Computer Architecture News, vol.38, issue.3, pp.235-246, 2010. ,
DOI : 10.1145/1816038.1815992
Pin, ACM SIGPLAN Notices, vol.40, issue.6, pp.190-200, 2005. ,
DOI : 10.1145/1064978.1065034
A new case for the TAGE branch predictor, Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-44 '11, pp.117-127, 2011. ,
DOI : 10.1145/2155620.2155635
URL : https://hal.archives-ouvertes.fr/hal-00639193
Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling, 2009 IEEE International Conference on Computer Design, pp.282-288, 2009. ,
DOI : 10.1109/ICCD.2009.5413143
Highlights of the High-Bandwidth Memory (HBM) standard, in: Memory Forum Workshop, 2014. ,
McPAT, Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, Micro-42, pp.42-469, 2009. ,
DOI : 10.1145/1669112.1669172
Quantifying sources of error in McPAT and potential impacts on architectural studies, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), pp.577-589, 2015. ,
DOI : 10.1109/HPCA.2015.7056064
Dynamic Detection of Uniform and Affine Vectors in GPGPU Computations, Europar 3rd Workshop on Highly Parallel Processing on a Chip (HPPC), pp.46-55, 2009. ,
DOI : 10.1007/978-3-642-14122-5_8
URL : https://hal.archives-ouvertes.fr/hal-00396719