J. E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym, NVIDIA Tesla: A Unified Graphics and Computing Architecture, IEEE Micro, vol.28, issue.2, pp.39-55, 2008.
DOI : 10.1109/MM.2008.31

A. Bakhoda, G. Yuan, W. W. Fung, H. Wong, and T. M. Aamodt, Analyzing CUDA workloads using a detailed GPU simulator, 2009 IEEE International Symposium on Performance Analysis of Systems and Software, pp.163-174, 2009.
DOI : 10.1109/ISPASS.2009.4919648

G. Diamos, A. Kerr, and M. Kesavan, Translating GPU binaries to tiered SIMD architectures with Ocelot, 2009.

E. Lindholm, M. Y. Siu, S. S. Moy, S. Liu, and J. R. Nickolls, Simulating multiported memories using lower port count memories, US Patent US, pp.7339592-7339594, 2008.

S. Balakrishnan and G. S. Sohi, Exploiting value locality in physical register files, 22nd Digital Avionics Systems Conference. Proceedings (Cat. No.03CH37449), p.265, 2003.
DOI : 10.1109/MICRO.2003.1253201

G. Rizk and D. Lavenier, GPU accelerated RNA folding algorithm, In: Computational Science ? ICCS LNCS, vol.5544, pp.1004-1013, 2009.
DOI : 10.1016/b978-0-12-384988-5.00014-0
URL : https://hal.archives-ouvertes.fr/hal-00637827

S. Mueller, C. Jacobi, H. J. Oh, K. Tran, S. Cottier et al., The Vector Floating-Point Unit in a Synergistic Processor Element of a CELL Processor, 17th IEEE Symposium on Computer Arithmetic (ARITH'05), pp.59-67, 2005.
DOI : 10.1109/ARITH.2005.45

S. Collange, D. Defour, and A. Tisserand, Power Consuption of GPUs from a Software Perspective, Lecture Notes in Computer Science, vol.5544, pp.922-931, 2009.