Load Balancing for P-Grade Parallel Applications

Author(s):  
Márton László Tóth ◽  
Norbert Podhorszki ◽  
Peter Kacsuk
Author(s):  
Gengbin Zheng ◽  
Abhinav Bhatelé ◽  
Esteban Meneses ◽  
Laxmikant V. Kalé

Large parallel machines with hundreds of thousands of processors are becoming more prevalent. Ensuring good load balance is critical for scaling certain classes of parallel applications on even thousands of processors. Centralized load balancing algorithms suffer from scalability problems, especially on machines with a relatively small amount of memory. Fully distributed load balancing algorithms, on the other hand, tend to take longer to arrive at good solutions. In this paper, we present an automatic dynamic hierarchical load balancing method that overcomes the scalability challenges of centralized schemes and longer running times of traditional distributed schemes. Our solution overcomes these issues by creating multiple levels of load balancing domains which form a tree. This hierarchical method is demonstrated within a measurement-based load balancing framework in Charm++. We discuss techniques to deal with scalability challenges of load balancing at very large scale. We present performance data of the hierarchical load balancing method on up to 16,384 cores of Ranger (at the Texas Advanced Computing Center) and 65,536 cores of Intrepid (the Blue Gene/P at Argonne National Laboratory) for a synthetic benchmark. We also demonstrate the successful deployment of the method in a scientific application, NAMD, with results on Intrepid.


2012 ◽  
Vol 13 (6) ◽  
pp. 413-427 ◽  
Author(s):  
Eunsung Kim ◽  
Hyeonsang Eom ◽  
Heon Y. Yeom

Author(s):  
Eric Aubanel

The problem of load balancing parallel applications is particularly challenging on computational grids, since the characteristics of both the application and the platform must be taken into account. This chapter reviews the wide range of solutions that have been proposed. It considers tightly coupled parallel applications that can be described by an undirected graph representing concurrent execution of tasks and communication of tasks, executing on computational grids with static and dynamic network and processor performance. While a rich set of solution techniques have been proposed, there has not been of yet any performance comparisons between them. Such comparisons will require parallel benchmarks and computational grid emulators and simulators.


Author(s):  
Stanley Y. Chien ◽  
Gun Makinabakan ◽  
Akin Ecer ◽  
Hasan U. Akay

Author(s):  
Wouter Joosen ◽  
Stijn Bijnens ◽  
Bert Robben ◽  
Johan Van Oeyen ◽  
Pierre Verbaeten

Sign in / Sign up

Export Citation Format

Share Document