The validity of the single processor approach to achieving large scale computing capabilities The NAS Parallel Benchmarks ? Summary and Preliminary Results, AFIPS Conference ProceedingsBBB + 91] Proceedings of the 1991 ACM/IEEE Conference on Supercomputing, SC'91, pp.483-485, 1967. ,
Impact of Cache Partitioning on Multi-tasking Real Time Embedded Systems, 2008 14th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, pp.101-110, 2008. ,
DOI : 10.1109/RTCSA.2008.42
Contention-Aware Scheduling on Multicore Systems, ACM Transactions on Computer Systems, vol.28, issue.4, pp.1-8, 2010. ,
DOI : 10.1145/1880018.1880019
A Practical Method for Estimating Performance Degradation on Multicore Processors, and Its Application to HPC Workloads, SC '00 Proc. Int. conf. High Performance Computing, Networking, Storage and Analysis, SC '12, pp.1-8311, 2000. ,
A methodology for co-location aware application performance modeling in multicore computing, Parallel and Distributed Processing Symposium Workshop (IPDPSW), pp.434-443, 2015. ,
Report on the sunway taihulight system. PDF). www. netlib. org Scheduling the I/O of HPC applications under congestion, IEEE Int. Parallel and Distributed Processing Symposium (IPDPS), pp.1013-1022, 2015. ,
Cache-aware scheduling and analysis for multicores, Proceedings of the seventh ACM international conference on Embedded software, EMSOFT '09, pp.245-254, 2009. ,
DOI : 10.1145/1629335.1629369
URL : http://user.it.uu.se/~yi/pdf-files/emsoft09-yi.pdf
A tale of two laws, Int. J. High Performance Computing Applications, vol.29, issue.3, pp.320-330, 2015. ,
On the nature of cache miss behavior: Is it ? 2. The Journal of Instruction-Level Parallelism, pp.1-22, 2008. ,
Developing Graph-Based Co-Scheduling Algorithms on Multicore Computers, IEEE Transactions on Parallel and Distributed Systems, vol.27, issue.6, pp.1617-1632, 2016. ,
DOI : 10.1109/TPDS.2015.2468223
Intel 64 and IA-32 architectures software developer's manual, 3B: System Programming Guide, 2014. ,
Analysis and approximation of optimal co-scheduling on chip multiprocessors, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, PACT '08, pp.220-229, 2008. ,
DOI : 10.1145/1454115.1454146
Evaluating STT-RAM as an energy-efficient main memory alternative, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp.256-267, 2013. ,
DOI : 10.1109/ISPASS.2013.6557176
URL : http://www.pdl.cmu.edu/PDL-FTP/NVM/sttram_ispass13.pdf
Data sharing in multi-threaded applications and its impact on chip design, 2012 IEEE International Symposium on Performance Analysis of Systems & Software, pp.125-134, 2012. ,
DOI : 10.1109/ISPASS.2012.6189219
Improving Resource Efficiency at Scale with Heracles, ACM Transactions on Computer Systems, vol.34, issue.2, p.6, 2016. ,
DOI : 10.1109/MICRO.2014.53
URL : http://dl.acm.org/ft_gateway.cfm?id=2882783&type=pdf
Reconciling high server utilization and sub-millisecond quality-of-service, Proceedings of the Ninth European Conference on Computer Systems, EuroSys '14, p.4, 2014. ,
DOI : 10.1145/2592798.2592821
PEBIL: Efficient static binary instrumentation for Linux, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS), pp.175-183, 2010. ,
DOI : 10.1109/ISPASS.2010.5452024
URL : http://www.sdsc.edu/PMaC/publications/pubs/laurenzanopebil2010.pdf
Cache Coherence Protocol and Memory Performance of the Intel Haswell-EP Architecture, 2015 44th International Conference on Parallel Processing, pp.739-748, 2015. ,
DOI : 10.1109/ICPP.2015.83
Reducing memory interference in multicore systems via application-aware memory channel partitioning, Proc. 44th IEEE/ACM Int. Sym. Microarchitecture, pp.44-374, 2011. ,
Toward the efficient use of multiple explicitly managed memory subsystems, 2014 IEEE International Conference on Cluster Computing (CLUSTER), pp.123-131, 2014. ,
DOI : 10.1109/CLUSTER.2014.6968756
Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches, Proc. 39th IEEE/ACM Int ,
Scaling the bandwidth wall: challenges in and avenues for CMP scaling Largescale compute-intensive analysis via a combined in-situ and co-scheduling workflow approach, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC'15, pp.423-432371, 2006. ,
A study on optimally co-scheduling jobs of different lengths on chip multiprocessors, Proceedings of the 6th ACM conference on Computing frontiers, CF '09, pp.41-50, 2009. ,
DOI : 10.1145/1531743.1531752
Addressing shared resource contention in multicore processors via scheduling, 44th Int. Conf. Parallel Processing (ICPP), pp.129-142, 2010. ,
DOI : 10.1145/1735971.1736036
URL : http://www.cs.sfu.ca/~fedorova/papers/asplos212-zhuravlev.pdf
SMiTe: Precise QoS Prediction on Real-System SMT Processors to Improve Utilization in Warehouse Scale Computers, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp.406-418, 2014. ,
DOI : 10.1109/MICRO.2014.53