Does Shared-Memory, Highly Multi-Threaded, Single-Application Scale on Many-Cores?

Abstract : Nowadays, single-chip cache-coherent multi-cores up to 100 cores are a reality. Many-cores of hundreds of cores are planned in the near future. Due to the large number of cores and for power efficiency reasons (performance per watt), cores become simpler with small caches. To get efficient use of parallelism offered by these architectures, applications must be multi-threads. The POSIX Threads (PThreads) standard is the most portable way to use threads across operating systems. It is also used as a low-level layer to support other portable, shared-memory, parallel environments like OpenMP. In this paper, we propose to verify experimentally the scalability of shared-memory, PThreads based, applications, on Cycle-Accurate-Bit-Accurate (CABA) simulated, 512-cores. Using two unmodified highly multi-threads applications, SPLASH-2 FFT, and EPFilter (medical images noise-filtering application provided by Phillips) our study shows a scalability limitation beyond 64 cores for FFT and 256 cores for EPFilter. Based on hardware events counters, our analysis shows: (i) the detected scalability limitation is a conceptual problem related to the notion of thread and process; and (ii) the small per-core caches found in many-cores exacerbates the problem. Finally, we present our solution in principle and future work.
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-00742947
Contributor : Ghassan Almaless <>
Submitted on : Wednesday, October 17, 2012 - 4:52:06 PM
Last modification on : Thursday, March 21, 2019 - 1:07:57 PM

Identifiers

  • HAL Id : hal-00742947, version 1

Citation

Ghassan Almaless, Franck Wajsburt. Does Shared-Memory, Highly Multi-Threaded, Single-Application Scale on Many-Cores?. 4th USENIX Workshop on Hot Topics in Parallelism, Jun 2012, Berkeley, CA, United States. ⟨hal-00742947⟩

Share

Metrics

Record views

101