Skip to Main content Skip to Navigation

Topology-Aware Load Balancing for Performance Portability over Parallel High Performance Systems

Laércio L. Pilla 1, 2, 3
LIG [2007-2015] - Laboratoire d'Informatique de Grenoble [2007-2015]
3 CORSE [2014-2015] - Compiler Optimization and Run-time Systems [2014-2015]
Inria Grenoble - Rhône-Alpes, LIG [2007-2015] - Laboratoire d'Informatique de Grenoble [2007-2015]
Abstract : This thesis presents our research to provide performance portability and scalability to complex scientific applications running over hierarchical multicore parallel platforms. Performance portability is said to be attained when a low core idleness is achieved while mapping a given application to different platforms, and can be affected by performance problems such as load imbalance and costly communications, and overheads coming from the task mapping algorithm. Load imbalance is a result of irregular and dynamic load behaviors, where the amount of work to be processed varies depending on the task and the step of the simulation. Meanwhile, costly communications are caused by a task distribution that does not take into account the different communication times present in a hierarchical platform. This includes nonuniform and asymmetric communication costs at memory and network levels. Lastly, task mapping overheads come from the execution time of the task mapping algorithm trying to mitigate load imbalance and costly communications, and from the migration of tasks. Our approach to achieve the goal of performance portability is based on the hypothesis that precise machine topology information can help task mapping algorithms in their decisions. In this context, we proposed a generic machine topology model of parallel platforms composed of one or more multicore compute nodes. It includes profiled latencies and bandwidths at memory and network levels, and highlights asymmetries and nonuniformity at both levels. This information is employed by our three proposed topology-aware load balancing algorithms, named NucoLB, HwTopoLB, and HierarchicalLB. Besides topology information, these algorithms also employ application information gathered during runtime. NucoLB focuses on the nonuniform aspects of parallel platforms, while HwTopoLB considers the whole hierarchy in its decisions, and HierarchicalLB combines these algorithms hierarchically to reduce its task mapping overhead. These algorithms seek to mitigate load imbalance and costly communic! ations while averting task migration overheads. Experimental results with the proposed load balancers over different platform composed of one or more multicore compute nodes showed performance improvements over state of the art load balancing algorithms: NucoLB presented improvements of up to 19% on one compute node; HwTopoLB experienced performance improvements of 19% on average; and HierarchicalLB outperformed HwTopoLB by 22% on average on parallel platforms with ten or more compute nodes. These results were achieved by equalizing work among the available resources, reducing the communication costs experienced by applications, and by keeping load balancing overheads low. In this sense, our load balancing algorithms provide performance portability to scientific applications while being independent from application and system architecture.
Complete list of metadatas

Cited literature [36 references]  Display  Hide  Download
Contributor : Jean-Francois Méhaut <>
Submitted on : Monday, April 21, 2014 - 8:06:15 AM
Last modification on : Tuesday, July 21, 2020 - 9:22:02 AM
Long-term archiving on: : Monday, April 10, 2017 - 4:16:12 PM


  • HAL Id : tel-00981136, version 1



Laércio L. Pilla. Topology-Aware Load Balancing for Performance Portability over Parallel High Performance Systems. Distributed, Parallel, and Cluster Computing [cs.DC]. Université de Grenoble; UFRGS, 2014. English. ⟨tel-00981136⟩



Record views


Files downloads