S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer et al., Rodinia: A benchmark suite for heterogeneous computing, 2009 IEEE International Symposium on Workload Characterization (IISWC), pp.44-54, 2009.
DOI : 10.1109/IISWC.2009.5306797

S. Collange, M. Daumas, D. Defour, and D. Parello, Barra: A Parallel Functional Simulator for GPGPU, 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, pp.351-360, 2010.
DOI : 10.1109/MASCOTS.2010.43

S. Collange, D. Defour, and Y. Zhang, Dynamic Detection of Uniform and Affine Vectors in GPGPU Computations, Europar 3rd Workshop on Highly Parallel Processing on a Chip (HPPC), volume LNCS 6043, pp.46-55, 2009.
DOI : 10.1007/978-3-642-14122-5_8
URL : https://hal.archives-ouvertes.fr/hal-00396719

S. Collange and A. Kouyoumdjian, Affine Vector Cache for memory bandwidth savings, 2011.
URL : https://hal.archives-ouvertes.fr/ensl-00649200

J. D. Collins, D. M. Tullsen, and H. Wang, Control Flow Optimization Via Dynamic Reconvergence Prediction, 37th International Symposium on Microarchitecture (MICRO-37'04), pp.129-140, 2004.
DOI : 10.1109/MICRO.2004.13

B. W. Coon, P. C. Mills, S. F. Oberman, and M. Y. Siu, Tracking register usage during multithreaded processing using a scoreboard having separate memory regions and storing sequential register size indicators, US Patent, vol.7434032, 2008.

G. Dasika, A. Sethia, T. Mudge, and S. Mahlke, PEPSC: A Power-Efficient Processor for Scientific Computing, 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011.
DOI : 10.1109/PACT.2011.16

M. Dechene, E. Forbes, and E. Rotenberg, Multithreaded instruction sharing, 2010.

G. Diamos, A. Kerr, H. Wu, S. Yalamanchili, B. Ashbaugh et al., SIMD re-convergence at thread frontiers, Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-44 '11, 2011.
DOI : 10.1145/2155620.2155676

R. Dolbeau and A. Seznec, CASH: Revisiting hardware sharing in single-chip parallel processor, Journal of Instruction-Level Parallelism, vol.6, pp.1-16, 2004.
URL : https://hal.archives-ouvertes.fr/inria-00071925

R. Espasa and M. Valero, Multithreaded vector architectures, Proceedings Third International Symposium on High-Performance Computer Architecture, pp.237-244, 1997.
DOI : 10.1109/HPCA.1997.569677
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.5271

P. Fortin, M. Gouicem, and S. Graillat, Towards solving the table maker dilemma on GPU, 20th Euromicro International Conference on Parallel, Distributed and Network-Based Computing, p.2012
URL : https://hal.archives-ouvertes.fr/hal-00642337

W. Fung and T. Aamodt, Thread block compaction for efficient SIMT control flow, 2011 IEEE 17th International Symposium on High Performance Computer Architecture, pp.25-36, 2011.
DOI : 10.1109/HPCA.2011.5749714

W. W. Fung, I. Sham, G. Yuan, and T. M. Aamodt, Dynamic warp formation, ACM Transactions on Architecture and Code Optimization, vol.6, issue.2, pp.1-7, 2009.
DOI : 10.1145/1543753.1543756

M. Gebhart, D. R. Johnson, D. Tarjan, S. W. Keckler, W. J. Dally et al., Energyefficient mechanisms for managing thread context in throughput processors, Proceeding of the 38th annual international symposium on Computer architecture, pp.235-246, 2011.

A. Glew, Coherent vector lane threading, 2009.

R. Krashinsky, C. Batten, M. Hampton, S. Gerding, B. Pharris et al., The vector-thread architecture, IEEE Micro, vol.24, issue.6, pp.84-90, 2004.
DOI : 10.1109/MM.2004.90

R. Kumar, N. P. Jouppi, and D. M. Tullsen, Conjoinedcore chip multiprocessing, IEEE/ACM International Symposium on Microarchitecture, pp.195-206, 2004.
DOI : 10.1109/micro.2004.12
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.111.7776

A. Lashgar and A. Baniasadi, Performance in GPU architectures: Potentials and distances, 9th Annual Workshop on Duplicating, Deconstructing, and Debunking (WDDD11), in conjunction with ISCA-38, 2011.

V. W. Lee, C. Kim, J. Chhugani, M. Deisher, D. Kim et al., Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU, ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture, pp.451-460, 2010.

J. E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym, NVIDIA Tesla: A Unified Graphics and Computing Architecture, IEEE Micro, vol.28, issue.2, pp.39-55, 2008.
DOI : 10.1109/MM.2008.31

G. Long, D. Franklin, S. Biswas, P. Ortiz, J. Oberg et al., Minimal Multi-threading: Finding and Removing Redundant Instructions in Multi-threaded Processors, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, pp.337-348, 2010.
DOI : 10.1109/MICRO.2010.41

J. Meng, D. Tarjan, and K. Skadron, Dynamic warp subdivision for integrated branch and memory divergence tolerance, ACM SIGARCH Computer Architecture News, vol.38, issue.3, pp.235-246, 2010.
DOI : 10.1145/1816038.1815992

P. C. Mills, J. E. Lindholm, B. W. Coon, G. M. Tarolli, and J. M. Burgess, Scheduler in multi-threaded processor prioritizing instructions passing qualification rule, 2011.

V. Narasiman, C. J. Lee, M. Shebanow, R. Miftakhutdinov, O. Mutlu et al., Improving GPU performance via large warps and two-level warp scheduling, Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-44 '11, 2011.
DOI : 10.1145/2155620.2155656
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.357.9840

J. Nickolls and W. J. Dally, The GPU Computing Era, IEEE Micro, vol.30, issue.2, pp.56-69, 2010.
DOI : 10.1109/MM.2010.41

K. Pagiamtzis and A. Sheikholeslami, Content-Addressable Memory (CAM) Circuits and Architectures: A Tutorial and Survey, IEEE Journal of Solid-State Circuits, vol.41, issue.3, pp.712-727, 2006.
DOI : 10.1109/JSSC.2005.864128

S. Rivoire, R. Schultz, T. Okuda, and C. Kozyrakis, Vector Lane Threading, 2006 International Conference on Parallel Processing (ICPP'06), pp.55-64, 2006.
DOI : 10.1109/ICPP.2006.74

D. M. Tullsen, S. J. Eggers, and H. M. Levy, Simultaneous multithreading, ACM SIGARCH Computer Architecture News, vol.23, issue.2, pp.392-403, 1995.
DOI : 10.1145/225830.224449