E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak et al., Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics: Conference Series, p.12037, 2009.
DOI : 10.1088/1742-6596/180/1/012037

F. Broquedis, J. Clet-ortega, S. Moreaud, N. Furmento, B. Goglin et al., hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, pp.180-186, 2010.
DOI : 10.1109/PDP.2010.67

URL : https://hal.archives-ouvertes.fr/inria-00429889

D. Eres, Generalized lattice-Boltzmann equations Rarefied gas dynamics-Theory and simulations, pp.450-458, 1994.

D. Dhumì-eres, I. Ginzburg, M. Krafczyk, P. Lallemand, and L. Luo, Multiple-relaxation-time lattice Boltzmann models in three dimensions, Philosophical Transactions: Mathematical, Physical and Engineering Sciences, pp.437-451, 2002.

J. Dongarra, S. Moore, G. Peterson, S. Tomov, J. Allred et al., Exploring New Architectures in Accelerating CFD for Air Force Applications, 2008 DoD HPCMP Users Group Conference, pp.14-17, 2008.
DOI : 10.1109/DoD.HPCMP.UGC.2008.12

Z. Fan, F. Qiu, A. Kaufman, and S. Yoakum-stover, GPU cluster for high performance computing, Proceedings of the 2004 ACM/IEEE conference on Supercomputing, p.47, 2004.

U. Frisch, B. Hasslacher, and Y. Pomeau, Lattice-Gas Automata for the Navier-Stokes Equation, Physical Review Letters, vol.56, issue.14, pp.1505-1508, 1986.
DOI : 10.1103/PhysRevLett.56.1505

J. Latt, Palabos Benchmarks (3D Lid-driven Cavity on Blue Gene/P)

G. R. Mcnamara and G. Zanetti, Use of the Boltzmann Equation to Simulate Lattice-Gas Automata, Physical Review Letters, vol.61, issue.20, pp.2332-2335, 1988.
DOI : 10.1103/PhysRevLett.61.2332

C. Obrecht, F. Kuznik, B. Tourancheau, and J. Roux, A new approach to the lattice Boltzmann method for graphics processing units, Computers & Mathematics with Applications, vol.61, issue.12, 2010.
DOI : 10.1016/j.camwa.2010.01.054

URL : https://hal.archives-ouvertes.fr/inria-00568674

C. Obrecht, F. Kuznik, B. Tourancheau, and J. Roux, Global Memory Access Modelling for Efficient Implementation of the LBM on GPUs, High Performance Computing for Computational Science ? VECPAR2010. Lecture Notes in Computer Science, 2010.
URL : https://hal.archives-ouvertes.fr/hal-01003059

E. Riegel, T. Indinger, and N. Adams, Implementation of a??Lattice???Boltzmann method for numerical fluid mechanics using the nVIDIA CUDA technology, Computer Science - Research and Development, vol.8, issue.4, pp.241-247, 2009.
DOI : 10.1007/s00450-009-0087-3

G. Ruetsch and P. Micikevicius, Optimizing matrix transpose in CUDA. NVIDIA CUDA SDK Application Note, 2009.

J. Tölke, Implementation of a Lattice Boltzmann kernel using the Compute Unified Device Architecture developed by nVIDIA, Computing and Visualization in Science, vol.17, issue.4, pp.1-11, 2008.
DOI : 10.1007/s00791-008-0120-2

J. Tölke and M. Krafczyk, TeraFLOP computing on a desktop PC with GPUs for 3D CFD, International Journal of Computational Fluid Dynamics, vol.77, issue.7, pp.443-456, 2008.
DOI : 10.1002/cav.143