scholarly journals The Zoltan and Isorropia Parallel Toolkits for Combinatorial Scientific Computing: Partitioning, Ordering and Coloring

2012 ◽  
Vol 20 (2) ◽  
pp. 129-150 ◽  
Author(s):  
Erik G. Boman ◽  
Ümit V. Çatalyürek ◽  
Cédric Chevalier ◽  
Karen D. Devine

Partitioning and load balancing are important problems in scientific computing that can be modeled as combinatorial problems using graphs or hypergraphs. The Zoltan toolkit was developed primarily for partitioning and load balancing to support dynamic parallel applications, but has expanded to support other problems in combinatorial scientific computing, including matrix ordering and graph coloring. Zoltan is based on abstract user interfaces and uses callback functions. To simplify the use and integration of Zoltan with other matrix-based frameworks, such as the ones in Trilinos, we developed Isorropia as a Trilinos package, which supports most of Zoltan's features via a matrix-based interface. In addition to providing an easy-to-use matrix-based interface to Zoltan, Isorropia also serves as a platform for additional matrix algorithms. In this paper, we give an overview of the Zoltan and Isorropia toolkits, their design, capabilities and use. We also show how Zoltan and Isorropia enable large-scale, parallel scientific simulations, and describe current and future development in the next-generation package Zoltan2.

Author(s):  
Gengbin Zheng ◽  
Abhinav Bhatelé ◽  
Esteban Meneses ◽  
Laxmikant V. Kalé

Large parallel machines with hundreds of thousands of processors are becoming more prevalent. Ensuring good load balance is critical for scaling certain classes of parallel applications on even thousands of processors. Centralized load balancing algorithms suffer from scalability problems, especially on machines with a relatively small amount of memory. Fully distributed load balancing algorithms, on the other hand, tend to take longer to arrive at good solutions. In this paper, we present an automatic dynamic hierarchical load balancing method that overcomes the scalability challenges of centralized schemes and longer running times of traditional distributed schemes. Our solution overcomes these issues by creating multiple levels of load balancing domains which form a tree. This hierarchical method is demonstrated within a measurement-based load balancing framework in Charm++. We discuss techniques to deal with scalability challenges of load balancing at very large scale. We present performance data of the hierarchical load balancing method on up to 16,384 cores of Ranger (at the Texas Advanced Computing Center) and 65,536 cores of Intrepid (the Blue Gene/P at Argonne National Laboratory) for a synthetic benchmark. We also demonstrate the successful deployment of the method in a scientific application, NAMD, with results on Intrepid.


2009 ◽  
Vol 180 ◽  
pp. 012008 ◽  
Author(s):  
Erik G Boman ◽  
Umit V Catalyurek ◽  
Céedric Chevalier ◽  
Karen D Devine ◽  
Ilya Safro ◽  
...  

Author(s):  
Mamadou Diarra ◽  
Telesphore Tiendrebeogo

The advent of Big Data has seen the emergence of new processing and storage challenges. These challenges are often solved by distributed processing. Distributed systems are inherently dynamic and unstable, so it is realistic to expect that some resources will fail during use. Load balancing and task scheduling is an important step in determining the performance of parallel applications. Hence the need to design load balancing algorithms adapted to grid computing. In this paper, we propose a dynamic and hierarchical load balancing strategy at two levels: Intrascheduler load balancing, in order to avoid the use of the large-scale communication network, and interscheduler load balancing, for a load regulation of our whole system. The strategy allows improving the average response time of CLOAK-Reduce application tasks with minimal communication. We first focus on the three performance indicators, namely response time, process latency and running time of MapReduce tasks.


Author(s):  
Mark Endrei ◽  
Chao Jin ◽  
Minh Ngoc Dinh ◽  
David Abramson ◽  
Heidi Poxon ◽  
...  

Rising power costs and constraints are driving a growing focus on the energy efficiency of high performance computing systems. The unique characteristics of a particular system and workload and their effect on performance and energy efficiency are typically difficult for application users to assess and to control. Settings for optimum performance and energy efficiency can also diverge, so we need to identify trade-off options that guide a suitable balance between energy use and performance. We present statistical and machine learning models that only require a small number of runs to make accurate Pareto-optimal trade-off predictions using parameters that users can control. We study model training and validation using several parallel kernels and more complex workloads, including Algebraic Multigrid (AMG), Large-scale Atomic Molecular Massively Parallel Simulator, and Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics. We demonstrate that we can train the models using as few as 12 runs, with prediction error of less than 10%. Our AMG results identify trade-off options that provide up to 45% improvement in energy efficiency for around 10% performance loss. We reduce the sample measurement time required for AMG by 90%, from 13 h to 74 min.


Algorithms ◽  
2021 ◽  
Vol 14 (5) ◽  
pp. 154
Author(s):  
Marcus Walldén ◽  
Masao Okita ◽  
Fumihiko Ino ◽  
Dimitris Drikakis ◽  
Ioannis Kokkinakis

Increasing processing capabilities and input/output constraints of supercomputers have increased the use of co-processing approaches, i.e., visualizing and analyzing data sets of simulations on the fly. We present a method that evaluates the importance of different regions of simulation data and a data-driven approach that uses the proposed method to accelerate in-transit co-processing of large-scale simulations. We use the importance metrics to simultaneously employ multiple compression methods on different data regions to accelerate the in-transit co-processing. Our approach strives to adaptively compress data on the fly and uses load balancing to counteract memory imbalances. We demonstrate the method’s efficiency through a fluid mechanics application, a Richtmyer–Meshkov instability simulation, showing how to accelerate the in-transit co-processing of simulations. The results show that the proposed method expeditiously can identify regions of interest, even when using multiple metrics. Our approach achieved a speedup of 1.29× in a lossless scenario. The data decompression time was sped up by 2× compared to using a single compression method uniformly.


Author(s):  
Michael Wessely ◽  
Ticha Sethapakdi ◽  
Carlos Castillo ◽  
Jackson C. Snowden ◽  
Ollie Hanton ◽  
...  

Author(s):  
Hitomi Tamura ◽  
Masato Uchida ◽  
Masato Tsuru ◽  
Jun'ichi Shimada ◽  
Takeshi Ikenaga ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document