Resource-Aware Load Balancing of Parallel Applications

Author(s):  
Eric Aubanel

The problem of load balancing parallel applications is particularly challenging on computational grids, since the characteristics of both the application and the platform must be taken into account. This chapter reviews the wide range of solutions that have been proposed. It considers tightly coupled parallel applications that can be described by an undirected graph representing concurrent execution of tasks and communication of tasks, executing on computational grids with static and dynamic network and processor performance. While a rich set of solution techniques have been proposed, there has not been of yet any performance comparisons between them. Such comparisons will require parallel benchmarks and computational grid emulators and simulators.

Author(s):  
Francesco Palmieri ◽  
Ugo Fiore

In the past decade there has been a remarkable change from mainframe-based centralized computing to a distributed client/server approach. In the coming decade this trend is likely to continue with further shifts towards network centric collaborative computing. At the state of the art, the key technology in collaborative computing is the computational grid paradigm. Like an electrical power grid, the computational Grid will aim to provide a steady, reliable source of computing power. More precisely, the term grid is now adopted to designate a common computational and/or data processing infrastructure built on distributed resources, highly heterogeneous (in their role, computing power and architecture), interconnected by heterogeneous communication networks and communicating through some basic services realized by a middleware stratum that offers a reliable, simple, uniform and often transparent interface to its resources such that an unaware user can submit jobs to the Grid just as if he/she was facing a large virtual supercomputer, so that large computing endeavors, consisting of one or more related jobs or tasks, are then transparently distributed over the network on the available computing resources. Such a workload distribution strategy, that is, to balance the tasks on different idle computers on the underlying networks, is the most important functionality in computational Grids, usually provided at the service level of the grid software infrastructure.


Author(s):  
Eduardo H. M. Cruz ◽  
Matthias Diener ◽  
Laércio L. Pilla ◽  
Philippe O. A. Navaux

Current and future architectures rely on thread-level parallelism to sustain performance growth. These architectures have introduced a complex memory hierarchy, consisting of several cores organized hierarchically with multiple cache levels and NUMA nodes. These memory hierarchies can have an impact on the performance and energy efficiency of parallel applications as the importance of memory access locality is increased. In order to improve locality, the analysis of the memory access behavior of parallel applications is critical for mapping threads and data. Nevertheless, most previous work relies on indirect information about the memory accesses, or does not combine thread and data mapping, resulting in less accurate mappings. In this paper, we propose the Sharing-Aware Memory Management Unit (SAMMU), an extension to the memory management unit that allows it to detect the memory access behavior in hardware. With this information, the operating system can perform online mapping without any previous knowledge about the behavior of the application. In the evaluation with a wide range of parallel applications (NAS Parallel Benchmarks and PARSEC Benchmark Suite), performance was improved by up to 35.7% (10.0% on average) and energy efficiency was improved by up to 11.9% (4.1% on average). These improvements happened due to a substantial reduction of cache misses and interconnection traffic.


Author(s):  
Gengbin Zheng ◽  
Abhinav Bhatelé ◽  
Esteban Meneses ◽  
Laxmikant V. Kalé

Large parallel machines with hundreds of thousands of processors are becoming more prevalent. Ensuring good load balance is critical for scaling certain classes of parallel applications on even thousands of processors. Centralized load balancing algorithms suffer from scalability problems, especially on machines with a relatively small amount of memory. Fully distributed load balancing algorithms, on the other hand, tend to take longer to arrive at good solutions. In this paper, we present an automatic dynamic hierarchical load balancing method that overcomes the scalability challenges of centralized schemes and longer running times of traditional distributed schemes. Our solution overcomes these issues by creating multiple levels of load balancing domains which form a tree. This hierarchical method is demonstrated within a measurement-based load balancing framework in Charm++. We discuss techniques to deal with scalability challenges of load balancing at very large scale. We present performance data of the hierarchical load balancing method on up to 16,384 cores of Ranger (at the Texas Advanced Computing Center) and 65,536 cores of Intrepid (the Blue Gene/P at Argonne National Laboratory) for a synthetic benchmark. We also demonstrate the successful deployment of the method in a scientific application, NAMD, with results on Intrepid.


Author(s):  
Annette Volk ◽  
Urmila Ghia

Computational Fluid Dynamics (CFD)-Discrete Element Method (DEM) simulations are designed to model a pseudo-two-dimensional fluidized bed. Bed behavior and accuracy of results are shown to change as the simulations are conducted on increasingly refined computational grids. Trends of the results with grid refinement are reported for both three-dimensional, uniform refinement, and for grid refinement in only the direction of bed thickness. Pseudo-2D simulation results are examined against previously published experimental data to assess relative accuracy compared to fully 3D simulation results. Two drag laws are employed in the simulations, resulting in different trends of results with computational grid refinement. From these results, we present suggestions for accurate model design.


2014 ◽  
Vol 2014 ◽  
pp. 1-13 ◽  
Author(s):  
Mouna Baklouti ◽  
Mohamed Abid

To meet the high performance demands of embedded multimedia applications, embedded systems are integrating multiple processing units. However, they are mostly based on custom-logic design methodology. Designing parallel multicore systems using available standards intellectual properties yet maintaining high performance is also a challenging issue. Softcore processors and field programmable gate arrays (FPGAs) are a cheap and fast option to develop and test such systems. This paper describes a FPGA-based design methodology to implement a rapid prototype of parametric multicore systems. A study of the viability of making the SoC using the NIOS II soft-processor core from Altera is also presented. The NIOS II features a general-purpose RISC CPU architecture designed to address a wide range of applications. The performance of the implemented architecture is discussed, and also some parallel applications are used for testing speedup and efficiency of the system. Experimental results demonstrate the performance of the proposed multicore system, which achieves better speedup than the GPU (29.5% faster for the FIR filter and 23.6% faster for the matrix-matrix multiplication).


Author(s):  
Zahid Raza ◽  
Deo P. Vidyarthi

Computational Grid attributed with distributed load sharing has evolved as a platform to large scale problem solving. Grid is a collection of heterogeneous resources, offering services of varying natures, in which jobs are submitted to any of the participating nodes. Scheduling these jobs in such a complex and dynamic environment has many challenges. Reliability analysis of the grid gains paramount importance because grid involves a large number of resources which may fail anytime, making it unreliable. These failures result in wastage of both computational power and money on the scarce grid resources. It is normally desired that the job should be scheduled in an environment that ensures maximum reliability to the job execution. This work presents a reliability based scheduling model for the jobs on the computational grid. The model considers the failure rate of both the software and hardware grid constituents like application demanding execution, nodes executing the job, and the network links supporting data exchange between the nodes. Job allocation using the proposed scheme becomes trusted as it schedules the job based on a priori reliability computation.


Author(s):  
João Phellipe ◽  
Carla Katarina ◽  
Francisco das Chagas ◽  
Dario Aloise

Computer processing power has evolved considerably in recent years. However, there are problems that still require many machines to perform a large amount of processing in a parallel and distributed way. In this context, the task scheduling in a distributed system present many algorithms. In this chapter, the authors present a scheduler based on genetic algorithms in order to distribute tasks more efficiently in a computational grid; it has been implemented in GRIDSIM, a computational grid simulator with the features and attributes of a real grid.


2014 ◽  
Vol 2014 ◽  
pp. 1-13 ◽  
Author(s):  
Seung-Jae Lee ◽  
Jun-Hyeok Lee ◽  
Jung-Chun Suh

The vorticity-velocity formulation of the Navier-Stokes equations allows purely kinematical problems to be decoupled from the pressure term, since the pressure is eliminated by applying the curl operator. The Vortex-In-Cell (VIC) method, which is based on the vorticity-velocity formulation, offers particle-mesh algorithms to numerically simulate flows past a solid body. The penalization method is used to enforce boundary conditions at a body surface with a decoupling between body boundaries and computational grids. Its main advantage is a highly efficient implementation for solid boundaries of arbitrary complexity on Cartesian grids. We present an efficient algorithm to numerically implement the vorticity-velocity-pressure formulation including a penalty term to simulate the pressure fields around a solid body. In vorticity-based methods, pressure field can be independently computed from the solution procedure for vorticity. This clearly simplifies the implementation and reduces the computational cost. Obtaining the pressure field at any fixed time represents the most challenging goal of this study. We validate the implementation by numerical simulations of an incompressible viscous flow around an impulsively started circular cylinder in a wide range of Reynolds numbers: Re=40, 550, 3000, and 9500.


2020 ◽  
Vol 12 (22) ◽  
pp. 9340
Author(s):  
Md. Sanwar Hossain ◽  
Khondoker Ziaul Islam ◽  
Abu Jahid ◽  
Khondokar Mizanur Rahman ◽  
Sarwar Ahmed ◽  
...  

With the proliferation of cellular networks, the ubiquitous availability of new-generation multimedia devices, and their wide-ranging data applications, telecom network operators are increasingly deploying the number of cellular base stations (BSs) to deal with unprecedented service demand. The rapid and radical deployment of the cellular network significantly exerts energy consumption and carbon footprints to the atmosphere. The ultimate objective of this work is to develop a sustainable and environmentally-friendly cellular infrastructure through compelling utilization of the locally available renewable energy sources (RES) namely solar photovoltaic (PV), wind turbine (WT), and biomass generator (BG). This article addresses the key challenges of envisioning the hybrid solar PV/WT/BG powered macro BSs in Bangladesh considering the dynamic profile of the RES and traffic intensity in the tempo-spatial domain. The optimal system architecture and technical criteria of the proposed system are critically evaluated with the help of HOMER optimization software for both on-grid and off-grid conditions to downsize the electricity generation cost and waste outflows while ensuring the desired quality of experience (QoE) over 20 years duration. Besides, the green energy-sharing mechanism under the off-grid condition and the grid-tied condition has been critically analyzed for optimal use of green energy. Moreover, the heuristic algorithm of the load balancing technique among collocated BSs has been incorporated for elevating the throughput and energy efficiency (EE) as well. The spectral efficiency (SE), energy efficiency, and outage probability performance of the contemplated wireless network are substantially examined using Matlab based Monte–Carlo simulation under a wide range of network configurations. Simulation results reveal that the proper load balancing technique pledges zero outage probability with expected system performance whereas energy cooperation policy offers an attractive solution for developing green mobile communications employing better utilization of renewable energy under the proposed hybrid solar PV/WT/BG scheme.


Sign in / Sign up

Export Citation Format

Share Document