Vector microprocessors, 1998. ,
Rodinia: A benchmark suite for heterogeneous computing, 2009 IEEE International Symposium on Workload Characterization (IISWC), pp.44-54, 2009. ,
DOI : 10.1109/IISWC.2009.5306797
Dynamic detection of uniform and affine vectors in GPGPU computations, Europar 3rd Workshop on Highly Parallel Processing on a Chip, 2009. ,
The design and implementation ocelot's dynamic binary translator from PTX to multi-core x86, 2009. ,
Ocelot, Proceedings of the 19th international conference on Parallel architectures and compilation techniques, PACT '10, 2010. ,
DOI : 10.1145/1854273.1854318
Stream Programming on General-Purpose Processors, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05), pp.343-354, 2005. ,
DOI : 10.1109/MICRO.2005.32
Intel 64 and IA-32 Architectures Software Developer's Manuals, Architecture, vol.1, 2010. ,
A characterization and analysis of GPGPU kernels, 2009. ,
LLVM: A compilation framework for lifelong program analysis & transformation, International Symposium on Code Generation and Optimization, 2004. CGO 2004., p.75, 2004. ,
DOI : 10.1109/CGO.2004.1281665
NVIDIA Tesla: A Unified Graphics and Computing Architecture, IEEE Micro, vol.28, issue.2, pp.39-55, 2008. ,
DOI : 10.1109/MM.2008.31
The OpenCL specification. Khronos OpenCL Working Group, 2009. ,
AMD 3DNow! technology: architecture and implementations, IEEE Micro, vol.19, issue.2, pp.37-48, 1999. ,
DOI : 10.1109/40.755466
Optimizing opencl on cpus, OpenCL BOF in SIGGRAPH, 2010. ,
Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs, Proceedings of the 8th annual IEEE/ ACM international symposium on Code generation and optimization, CGO '10, pp.111-119, 2010. ,
DOI : 10.1145/1772954.1772971
MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs, Languages and Compilers for Parallel Computing: 21th International Workshop, pp.16-30, 2008. ,
DOI : 10.1007/978-3-540-89740-8_2
Programming inverse memory hierarchy: case of stencils on GPUs, GPU Workshop for Scientific Computing, International Conference on Parallel Computational Fluid Dynamics (ParCFD), 2010. ,
Benchmarking GPUs to tune dense linear algebra, 2008 SC, International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-11, 2008. ,
DOI : 10.1109/SC.2008.5214359