scholarly journals A Unified Framework for Designing EPTAS’s for Load Balancing on Parallel Machines

Author(s):  
Ishai Kones ◽  
Asaf Levin
Author(s):  
Gengbin Zheng ◽  
Abhinav Bhatelé ◽  
Esteban Meneses ◽  
Laxmikant V. Kalé

Large parallel machines with hundreds of thousands of processors are becoming more prevalent. Ensuring good load balance is critical for scaling certain classes of parallel applications on even thousands of processors. Centralized load balancing algorithms suffer from scalability problems, especially on machines with a relatively small amount of memory. Fully distributed load balancing algorithms, on the other hand, tend to take longer to arrive at good solutions. In this paper, we present an automatic dynamic hierarchical load balancing method that overcomes the scalability challenges of centralized schemes and longer running times of traditional distributed schemes. Our solution overcomes these issues by creating multiple levels of load balancing domains which form a tree. This hierarchical method is demonstrated within a measurement-based load balancing framework in Charm++. We discuss techniques to deal with scalability challenges of load balancing at very large scale. We present performance data of the hierarchical load balancing method on up to 16,384 cores of Ranger (at the Texas Advanced Computing Center) and 65,536 cores of Intrepid (the Blue Gene/P at Argonne National Laboratory) for a synthetic benchmark. We also demonstrate the successful deployment of the method in a scientific application, NAMD, with results on Intrepid.


2008 ◽  
Vol 392-394 ◽  
pp. 250-255
Author(s):  
Yong Zhan ◽  
Chang Hua Qiu ◽  
Kai Xue

This paper considers the practical manufacturing environment of the hybrid flow shop (HFS) with non-identical machines in parallel. In order to significantly enhance the performance level of manufacturing, maintaining load balancing among parallel machines is very important. The aim of this paper is to minimize makespan with load balancing in a non-identical parallel machine environment by using hybrid genetic algorithm (HGA). In the HGA, the neighborhood search-based method is used together with genetic algorithm as local optimization method to balance the exploration and exploitation abilities. The representation of chromosome used in this paper is composed of two layers: allocation layer and sequencing layer, which can be encode and decoded easily. In generating initial population, a special constraint of load balancing between parallel machines is used to reduce the number of individuals. And particular crossover operation is used, which generates multiple offspring at a time, so that the efficiency of the algorithm can be well improved. At last, the proposed algorithm is tested on a benchmark, and numerical example shows good result.


2017 ◽  
Vol 27 (12) ◽  
pp. 2768-2774
Author(s):  
Rainald Löhner ◽  
Fumiya Togashi ◽  
Joseph David Baum

Purpose A common observation made when computing chemically reacting flows is how central processing unit (CPU)-intensive these are in comparison to cold flow cases. The update of tens or hundreds of species with hundreds or thousands of reactions can easily consume more than 95% of the total CPU time. In many cases, the region where reactions (combustion) are actually taking place comprises only a very small percentage of the volume. Typical examples are flame fronts propagating through a domain. In such cases, only a small fraction of points/cells needs a full chemistry update. This leads to extreme load imbalances on parallel machines. The purpose of the present work is to develop a methodology to balance the work in an optimal way. Design/methodology/approach Points that require a full chemistry update are identified, gathered and distributed across the network, so that work is evenly distributed. Once the chemistry has been updated, the unknowns are gathered back. Findings The procedure has been found to work extremely well, leading to optimal load balance with insignificant communication overheads. Research limitations/implications In many production runs, the procedure leads to a reduction in CPU requirements of more than an order of magnitude. This allows much larger and longer runs, improving accuracy and statistics. Practical implications The procedure has allowed the calculation of chemically reacting flow cases that were hitherto not possible. Originality/value To the authors’ knowledge, this type of load balancing has not been published before.


Author(s):  
SYLVAIN CONTASSOT-VIVIER ◽  
SERGE MIGUET

This paper introduces and compares three parallel algorithms to compute general geometric image transformations on MIMD machines. We propose three variants of a parallel general scheme. We focus on the load balancing and the data redistributions. Experimental results are reported and compared. The implementation has been done using PPCM library allowing us to run the program over different parallel machines. We compare logical communication schemes for message-passing machines. Since our parallel algorithm needs global communications such as multiscatters, we study the efficiency of two different logical topologies usable with PPCM. These studies allow us to find the best combination of algorithm and virtual topology to use on a given parallel machine.


Sign in / Sign up

Export Citation Format

Share Document