]. G. Amd67, D. H. Amdahl, E. Bailey, J. T. Barszcz, D. S. Barton et al., The validity of the single processor approach to achieving large scale computing capabilities The NAS Parallel Benchmarks ? Summary and Preliminary Results, AFIPS Conference ProceedingsBBB + 91] Proceedings of the 1991 ACM/IEEE Conference on Supercomputing, SC'91, pp.483-485, 1967.

B. D. Bui, M. Caccamo, L. Sha, and J. Martinez, Impact of Cache Partitioning on Multi-tasking Real Time Embedded Systems, 2008 14th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, pp.101-110, 2008.
DOI : 10.1109/RTCSA.2008.42

S. Blagodurov, S. Zhuravlev, and A. Fedorova, Contention-Aware Scheduling on Multicore Systems, ACM Transactions on Computer Systems, vol.28, issue.4, pp.1-8, 2010.
DOI : 10.1145/1880018.1880019

F. Cappello, D. Etiemble, I. Mpi-versus-mpi+openmp-on, . Sp, J. Gaud et al., A Practical Method for Estimating Performance Degradation on Multicore Processors, and Its Application to HPC Workloads, SC '00 Proc. Int. conf. High Performance Computing, Networking, Storage and Analysis, SC '12, pp.1-8311, 2000.

]. D. Djf-+-15, E. Dauwe, R. Jonardi, S. Friese, A. A. Pasricha et al., A methodology for co-location aware application performance modeling in multicore computing, Parallel and Distributed Processing Symposium Workshop (IPDPSW), pp.434-443, 2015.

J. D. , A. Gainaru, and G. Aupy, Report on the sunway taihulight system. PDF). www. netlib. org Scheduling the I/O of HPC applications under congestion, IEEE Int. Parallel and Distributed Processing Symposium (IPDPS), pp.1013-1022, 2015.

N. Guan, M. Stigge, W. Yi, and G. Yu, Cache-aware scheduling and analysis for multicores, Proceedings of the seventh ACM international conference on Embedded software, EMSOFT '09, pp.245-254, 2009.
DOI : 10.1145/1629335.1629369

URL : http://user.it.uu.se/~yi/pdf-files/emsoft09-yi.pdf

T. Michael and . Heath, A tale of two laws, Int. J. High Performance Computing Applications, vol.29, issue.3, pp.320-330, 2015.

A. Hartstein, V. Srinivasan, P. Puzak, and . Emma, On the nature of cache miss behavior: Is it ? 2. The Journal of Instruction-Level Parallelism, pp.1-22, 2008.

L. He, H. Zhu, and S. A. Jarvis, Developing Graph-Based Co-Scheduling Algorithms on Multicore Computers, IEEE Transactions on Parallel and Distributed Systems, vol.27, issue.6, pp.1617-1632, 2016.
DOI : 10.1109/TPDS.2015.2468223

. Intel, Intel 64 and IA-32 architectures software developer's manual, 3B: System Programming Guide, 2014.

Y. Jiang, X. Shen, J. Chen, and R. Tripathi, Analysis and approximation of optimal co-scheduling on chip multiprocessors, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, PACT '08, pp.220-229, 2008.
DOI : 10.1145/1454115.1454146

E. Kultursay, M. Kandemir, A. Sivasubramaniam, and O. Mutlu, Evaluating STT-RAM as an energy-efficient main memory alternative, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp.256-267, 2013.
DOI : 10.1109/ISPASS.2013.6557176

URL : http://www.pdl.cmu.edu/PDL-FTP/NVM/sttram_ispass13.pdf

A. Krishna, A. Samih, and Y. Solihin, Data sharing in multi-threaded applications and its impact on chip design, 2012 IEEE International Symposium on Performance Analysis of Systems & Software, pp.125-134, 2012.
DOI : 10.1109/ISPASS.2012.6189219

D. Lo, L. Cheng, R. Govindaraju, P. Ranganathan, and C. Kozyrakis, Improving Resource Efficiency at Scale with Heracles, ACM Transactions on Computer Systems, vol.34, issue.2, p.6, 2016.
DOI : 10.1109/MICRO.2014.53

URL : http://dl.acm.org/ft_gateway.cfm?id=2882783&type=pdf

J. Leverich and C. Kozyrakis, Reconciling high server utilization and sub-millisecond quality-of-service, Proceedings of the Ninth European Conference on Computer Systems, EuroSys '14, p.4, 2014.
DOI : 10.1145/2592798.2592821

M. A. Laurenzano, M. M. Tikir, L. Carrington, and A. Snavely, PEBIL: Efficient static binary instrumentation for Linux, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS), pp.175-183, 2010.
DOI : 10.1109/ISPASS.2010.5452024

URL : http://www.sdsc.edu/PMaC/publications/pubs/laurenzanopebil2010.pdf

D. Molka, D. Hackenberg, R. Schone, and W. E. Nagel, Cache Coherence Protocol and Memory Performance of the Intel Haswell-EP Architecture, 2015 44th International Conference on Parallel Processing, pp.739-748, 2015.
DOI : 10.1109/ICPP.2015.83

L. Sai-prashanth-muralidhara, O. Subramanian, M. Mutlu, T. Kandemir, and . Moscibroda, Reducing memory interference in multicore systems via application-aware memory channel partitioning, Proc. 44th IEEE/ACM Int. Sym. Microarchitecture, pp.44-374, 2011.

A. J. Pena and P. Balaji, Toward the efficient use of multiple explicitly managed memory subsystems, 2014 IEEE International Conference on Cluster Computing (CLUSTER), pp.123-131, 2014.
DOI : 10.1109/CLUSTER.2014.6968756

K. Moinuddin, Y. N. Qureshi, and . Patt, Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches, Proc. 39th IEEE/ACM Int

M. Symp, M. Brian, A. Rogers, . Krishna, B. Gordon et al., Scaling the bandwidth wall: challenges in and avenues for CMP scaling Largescale compute-intensive analysis via a combined in-situ and co-scheduling workflow approach, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC'15, pp.423-432371, 2006.

K. Tian, Y. Jiang, and X. Shen, A study on optimally co-scheduling jobs of different lengths on chip multiprocessors, Proceedings of the 6th ACM conference on Computing frontiers, CF '09, pp.41-50, 2009.
DOI : 10.1145/1531743.1531752

S. Zhuravlev, S. Blagodurov, A. Fedorova, H. Zhu, L. He et al., Addressing shared resource contention in multicore processors via scheduling, 44th Int. Conf. Parallel Processing (ICPP), pp.129-142, 2010.
DOI : 10.1145/1735971.1736036

URL : http://www.cs.sfu.ca/~fedorova/papers/asplos212-zhuravlev.pdf

Y. Zhang, A. Michael, J. Laurenzano, L. Mars, and . Tang, SMiTe: Precise QoS Prediction on Real-System SMT Processors to Improve Utilization in Warehouse Scale Computers, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp.406-418, 2014.
DOI : 10.1109/MICRO.2014.53