, , 2017.
Experiences and lessons learned with a portable interface to hardware performance counters, Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS'03, pp.289-291, 2003. ,
, Oprofile. a system profiler for linux
Perfmon2: a flexible performance monitoring interface for linux, Proceedings of the 2006 Ottawa Linux Symposium, pp.269-288, 2006. ,
Memprof: A memory profiler for numa multicore systems, Proceedings of the Usenix Annual Technical Conference, USENIX ATC'12, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00945731
Cmp$im: A pinbased on-the-fly multi-core cache simulator, Proceedings of the Fourth Annual Workshop on Modeling, Benchmarking and Simulation (MoBS), pp.28-36, 2008. ,
Assessing cache false sharing effects by dynamic binary instrumentation, Proceedings of the Workshop on Binary Instrumentation and Applications, pp.26-33, 2009. ,
Dynamic cache contention detection in multi-threaded applications, Proceedings of the international conference on Virtual Execution Environments, pp.27-38, 2011. ,
Trace-based automatic padding for locality improvement with correlative data visualization interface, Proceedings of the International Conference on Parallel Architectures and Compilation, PACT'07, 2007. ,
PREDATOR: Predictive false sharing detection, Proceedings of the symposium on Principles and Practices of Parallel Programming, PPoPP'14, pp.3-14, 2014. ,
Cheetah: detecting false sharing efficiently and effectively, Proceedings of the international symposium on Code Generation and Optimization, CGO'16, pp.1-11, 2016. ,
Analyzing lock contention in multithreaded applications, Proceedings of the symposium on Principles and Practices of Parallel Programming, PPoPP'10, pp.269-280, 2010. ,
Comprehending performance from real-world execution traces: A device-driver case, Proceedings of the conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'14, pp.193-206, 2014. ,
Continuously measuring critical section pressure with the free-lunch profiler, Proceedings of the conference on Object Oriented Programming Systems Languages and Applications, OOPSLA'14, pp.291-307, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01080277
Performance analysis of idle programs, Proceedings of the conference on Object Oriented Programming Systems Languages and Applications, OOPSLA'10, pp.739-753, 2010. ,
There goes the neighborhood: performance degradation due to nearby jobs, Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp.1-12, 2013. ,
Active measurement of the impact of network switch utilization on application performance, Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS'14, pp.165-174, 2014. ,
Statistical debugging for real-world performance problems, Proceedings of the conference on Object Oriented Programming Systems Languages and Applications, OOPSLA'14, pp.561-578, 2014. ,
Nonintrusive performance profiling for entire software stacks based on the flow reconstruction principle, Proceedings of the conference on Operating Systems Design and Implementation, OSDI'16, pp.603-618, 2016. ,
Statistical analysis of latency through semantic profiling, Proceedings of the EuroSys European Conference on Computer Systems, EuroSys'17, pp.64-79, 2017. ,
Operating system profiling via latency analysis, Proceedings of the conference on Operating Systems Design and Implementation, OSDI'06, pp.89-102, 2006. ,
Scalability analysis of spmd codes using expectations, Proceedings of the International conference on Supercomputing, ICS'07, pp.13-22, 2007. ,
Eztrace: a generic framework for performance analysis, Proceedings of the International Symposium on Cluster, Cloud and Grid Computing, CCGRID'11, pp.618-619, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00587216
Runtime function instrumentation with EZTrace, Proceedings of PROPER 2012 -Workshop on Productivity and Performance, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00863037
Selecting points of interest in traces using patterns of events, Proceedings of the International Conference on Parallel, Distributed, and NetworkBased Processing, PDP'15, pp.70-77, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01257904
Remote core locking: migrating critical-section execution to improve the performance of multithreaded applications, Proceedings of the Usenix Annual Technical Conference, USENIX ATC'12, pp.65-76, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00779908
False sharing and its effect on shared memory performance, Proceedings of the USENIX Symposium on Experiences with Distributed and Multiprocessor Systems (SEDMS), p.57, 1993. ,
SHERIFF: Precise detection and automatic mitigation of false sharing, Proceedings of the conference on Object Oriented Programming Systems Languages and Applications, OOPSLA'11, pp.3-18, 2011. ,
Fast and portable locking for multicore architectures, ACM Transactions on Computer Systems (TOCS), vol.33, issue.4, p.62, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01252167
Analysis of PARSEC workload scalability, Proceedings of the International Symposium on Performance Analysis of Systems and Software, ISPASS'16, pp.133-142, 2016. ,
Deconstructing the overhead in parallel applications, Proceedings of the International Symposium on Workload Characterization, IISWC'12, pp.59-68, 2012. ,
Evaluating mapreduce for multi-core and multiprocessor systems, Proceedings of the symposium on High Performance Computer Architecture, HPCA'07, pp.13-24, 2007. ,
The SPLASH-2 programs: Characterization and methodological considerations, Proceedings of the International Symposium on Computer Architecture, ISCA'95, pp.24-36, 1995. ,
The PARSEC benchmark suite: Characterization and architectural implications, Proceedings of the International Conference on Parallel Architectures and Compilation, PACT'06, pp.72-81, 2008. ,
Nas parallel benchmarks, Encyclopedia of Parallel Computing, pp.1254-1259, 2011. ,
Benchmarking memory performance with the data cube operator, NASA, Tech. Rep, 2004. ,
Distributed caching with memcached, Linux journal, vol.2004, issue.124, p.5, 2004. ,
memaslap: Load testing and benchmarking a server ,
LevelDB, 2011. ,
Coz: Finding code that counts with causal profiling, Proceedings of the Symposium on Operating Systems Principles, SOSP'15, pp.184-197, 2015. ,
, Facebook rocksdb
Mining precise performance-aware behavioral models from existing instrumentation, Proceedings of the International Conference on Software Engineering, ICSE'14, pp.484-487, 2014. ,
Application-specific quantum for multi-core platform scheduler, Proceedings of the EuroSys European Conference on Computer Systems, EuroSys'16, vol.3, pp.1-3, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01782587
Contention aware execution: Online contention detection and response, Proceedings of the international symposium on Code Generation and Optimization, CGO'10, pp.257-265, 2010. ,
A framework for automated performance bottleneck detection, Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS'08, pp.1-7, 2008. ,
Cache contention and application performance prediction for multi-core systems, Proceedings of the International Symposium on Performance Analysis of Systems and Software, ISPASS'10, pp.76-86, 2010. ,
Optimizing virtual machine consolidation performance on numa server architecture for cloud workloads, Proceedings of the International Symposium on Computer Architecture, ISCA'14, pp.325-336, 2014. ,
Detection of false sharing using machine learning, Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp.1-9, 2013. ,
Cachein: a toolset for comprehensive cache inspection, Proceedings of the International Conference on Computational Science, ICCS'05, pp.174-181, 2005. ,
Whose cache line is it anyway?: Operating system support for live detection and repair of false sharing, Proceedings of the EuroSys European Conference on Computer Systems, EuroSys'13, pp.141-154, 2013. ,
Scaanalyzer: A tool to identify memory scalability bottlenecks in parallel programs, Proceedings of the Conference for High Performance Computing, Networking, Storage and Analysis, SC'15, p.47, 2015. ,
Locating cache performance bottlenecks using data profiling, Proceedings of the EuroSys European Conference on Computer Systems, EuroSys'10, pp.335-348, 2010. ,
Traffic management: A holistic approach to memory placement on numa systems, Proceedings of the conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'13, pp.381-394, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00945758
NumaMMA: Numa MeMory Analyzer, Proceedings of the International Conference on Parallel Processing, ICPP'18, 2018. ,
URL : https://hal.archives-ouvertes.fr/cea-01854072
Criticality stacks: Identifying critical threads in parallel programs using synchronization behavior, Proceedings of the International Symposium on Computer Architecture, ISCA'13, pp.511-522, 2013. ,
Speedup stacks: Identifying scaling bottlenecks in multi-threaded applications, Proceedings of the International Symposium on Performance Analysis of Systems and Software, ISPASS'12, pp.145-155, 2012. ,