Energy Idle Aware Stochastic Lexicographic Local Searches for Precedence-Constraint Task List Scheduling on Heterogeneous Systems

The use of parallel applications in High-Performance Computing (HPC) demands high computing times and energy resources. Inadequate scheduling produces longer computing times which, in turn, increases energy consumption and monetary cost. Task scheduling is an NP-Hard problem; thus, several heuristics methods appear in the literature. The main approaches can be grouped into the following categories: fast heuristics, metaheuristics, and local search. Fast heuristics and metaheuristics are used when pre-scheduling times are short and long, respectively. The third is commonly used when pre-scheduling time is limited by CPU seconds or by objective function evaluations. This paper focuses on optimizing the scheduling of parallel applications, considering the energy consumption during the idle time while no tasks are executing. Additionally, we detail a comparative literature study of the performance of lexicographic variants with local searches adapted to be stochastic and aware of idle energy consumption.

Download Full-text

Energy-Efficient Reliability-Aware Scheduling Algorithm on Heterogeneous Systems

Scientific Programming ◽

10.1155/2016/9823213 ◽

2016 ◽

Vol 2016 ◽

pp. 1-13 ◽

Cited By ~ 3

Author(s):

Xiaoyong Tang ◽

Weizhen Tan

Keyword(s):

Energy Consumption ◽

System Performance ◽

High Performance ◽

Scheduling Algorithm ◽

Heterogeneous Systems ◽

High Energy ◽

Parallel Applications ◽

Energy Aware ◽

Simulation Performance ◽

Energy Aware Scheduling

The amount of energy needed to operate high-performance computing systems increases regularly since some years at a high pace, and the energy consumption has attracted a great deal of attention. Moreover, high energy consumption inevitably contains failures and reduces system reliability. However, there has been considerably less work of simultaneous management of system performance, reliability, and energy consumption on heterogeneous systems. In this paper, we first build the precedence-constrained parallel applications and energy consumption model. Then, we deduce the relation between reliability and processor frequencies and get their parameters approximation value by least squares curve fitting method. Thirdly, we establish a task execution reliability model and formulate this reliability and energy aware scheduling problem as a linear programming. Lastly, we propose a heuristic Reliability-Energy Aware Scheduling (REAS) algorithm to solve this problem, which can get good tradeoff among system performance, reliability, and energy consumption with lower complexity. Our extensive simulation performance evaluation study clearly demonstrates the tradeoff performance of our proposed heuristic algorithm.

Download Full-text

GRASP and Iterated Local Search-Based Cellular Processing algorithm for Precedence-Constraint Task List Scheduling on Heterogeneous Systems

Applied Sciences ◽

10.3390/app10217500 ◽

2020 ◽

Vol 10 (21) ◽

pp. 7500

Author(s):

Alejandro Santiago ◽

J. David Terán-Villanueva ◽

Salvador Ibarra Martínez ◽

José Antonio Castán Rocha ◽

Julio Laria Menchaca ◽

...

Keyword(s):

Local Search ◽

High Performance ◽

Heterogeneous Systems ◽

Search Space ◽

Iterated Local Search ◽

Detection Mechanism ◽

Task List ◽

Np Hard Problem ◽

Cellular Processing ◽

The Individual

High-Performance Computing systems rely on the software’s capability to be highly parallelized in individual computing tasks. However, even with a high parallelization level, poor scheduling can lead to long runtimes; this scheduling is in itself an NP-hard problem. Therefore, it is our interest to use a heuristic approach, particularly Cellular Processing Algorithms (CPA), which is a novel metaheuristic framework for optimization. This framework has its foundation in exploring the search space by multiple Processing Cells that communicate to exploit the search and in the individual stagnation detection mechanism in the Processing Cells. In this paper, we proposed using a Greedy Randomized Adaptive Search Procedure (GRASP) to look for promising task execution orders; later, a CPA formed with Iterated Local Search (ILS) Processing Cells is used for the optimization. We assess our approach with a high-performance ILS state-of-the-art approach. Experimental results show that the CPA outperforms the previous ILS in real applications and synthetic instances.

Download Full-text

A Hybrid Resource Reservation Method for Workflows in Clouds

International Journal of Grid and High Performance Computing ◽

10.4018/jghpc.2012100101 ◽

2012 ◽

Vol 4 (4) ◽

pp. 1-21

Author(s):

Tyng-Yeu Liang ◽

Fu-Chun Lu ◽

Jun-Yao Chiu

Keyword(s):

Cloud Computing ◽

Energy Consumption ◽

High Performance ◽

Resource Reservation ◽

Scientific Workflows ◽

Service Oriented ◽

Time And Energy ◽

Gpu Architecture ◽

Performance Computing ◽

Oriented System

QoS and energy consumption are two important issues for Cloud computing. In this paper, the authors propose a hybrid resource reservation method to address these two issues for scientific workflows in the high-performance computing Clouds built on hybrid CPU/GPU architecture. As named, this method reserves proper CPU or GPU for executing different jobs in the same workflow based on the profile of execution time and energy consumption of each resource-to-program pair. They have implemented the proposed resource reservation method on a real service-oriented system. The experimental results show that the proposed resource reservation method can effectively maintain the QoS of workflows while simultaneously minimizing the energy consumption of executing the workflows.

Download Full-text

The Sicilian Grid Infrastructure for High Performance Computing

International Journal of Distributed Systems and Technologies ◽

10.4018/jdst.2010090803 ◽

2010 ◽

Vol 1 (1) ◽

pp. 40-54 ◽

Cited By ~ 1

Author(s):

Carmelo Marcello Iacono-Manno ◽

Marco Fargetta ◽

Roberto Barbera ◽

Alberto Falzone ◽

Giuseppe Andronico ◽

...

Keyword(s):

High Performance Computing ◽

High Performance ◽

Parallel Applications ◽

Grid Infrastructure ◽

Scheduling Policy ◽

Computing Paradigm ◽

Regional Area ◽

Computer Fluid Dynamics ◽

Grid Infrastructures ◽

Performance Computing

The conjugation of High Performance Computing (HPC) and Grid paradigm with applications based on commercial software is one among the major challenges of today e-Infrastructures. Several research communities from either industry or academia need to run high parallel applications based on licensed software over hundreds of CPU cores; a satisfactory fulfillment of such requests is one of the keys for the penetration of this computing paradigm into the industry world and sustainability of Grid infrastructures. This problem has been tackled in the context of the PI2S2 project that created a regional e-Infrastructure in Sicily, the first in Italy over a regional area. Present article will describe the features added in order to integrate an HPC facility into the PI2S2 Grid infrastructure, the adoption of the InifiniBand low-latency net connection, the gLite middleware extended to support MPI/MPI2 jobs, the newly developed license server and the specific scheduling policy adopted. Moreover, it will show the results of some relevant use cases belonging to Computer Fluid-Dynamics (Fluent, OpenFOAM), Chemistry (GAMESS), Astro-Physics (Flash) and Bio-Informatics (ClustalW)).

Download Full-text

Service for parallel applications based on JINR cloud and HybriLIT resources

EPJ Web of Conferences ◽

10.1051/epjconf/201921407012 ◽

2019 ◽

Vol 214 ◽

pp. 07012 ◽

Cited By ~ 1

Author(s):

Nikita Balashov ◽

Maxim Bashashin ◽

Pavel Goncharov ◽

Ruslan Kuchumov ◽

Nikolay Kutovskiy ◽

...

Keyword(s):

High Performance ◽

Cloud Service ◽

Parallel Applications ◽

Cloud Infrastructure ◽

Modular Architecture ◽

Practical Applications ◽

Speed Up ◽

Scientific Results ◽

Computational Resources ◽

Performance Computing

Cloud computing has become a routine tool for scientists in many fields. The JINR cloud infrastructure provides JINR users with computational resources to perform various scientific calculations. In order to speed up achievements of scientific results the JINR cloud service for parallel applications has been developed. It consists of several components and implements a flexible and modular architecture which allows to utilize both more applications and various types of resources as computational backends. An example of using the Cloud&HybriLIT resources in scientific computing is the study of superconducting processes in the stacked long Josephson junctions (LJJ). The LJJ systems have undergone intensive research because of the perspective of practical applications in nano-electronics and quantum computing. In this contribution we generalize the experience in application of the Cloud&HybriLIT resources for high performance computing of physical characteristics in the LJJ system.

Download Full-text

A Non-intrusive Methodology to Improve the Performance of Parallel Applications in High Performance Computing

2012 41st International Conference on Parallel Processing Workshops ◽

10.1109/icppw.2012.56 ◽

2012 ◽

Author(s):

Fernando H.P. Luz ◽

Denis Taniguchi ◽

Liria M. Sato

Keyword(s):

High Performance Computing ◽

High Performance ◽

Parallel Applications ◽

Performance Computing

Download Full-text

The Sicilian Grid Infrastructure for High Performance Computing

Technology Integration Advancements in Distributed Systems and Computing ◽

10.4018/978-1-4666-0906-8.ch013 ◽

2012 ◽

pp. 215-227

Author(s):

Carmelo Marcello Iacono-Manno ◽

Marco Fargetta ◽

Roberto Barbera ◽

Alberto Falzone ◽

Giuseppe Andronico ◽

...

Keyword(s):

High Performance Computing ◽

High Performance ◽

Parallel Applications ◽

Grid Infrastructure ◽

Scheduling Policy ◽

Computing Paradigm ◽

Regional Area ◽

Computer Fluid Dynamics ◽

Grid Infrastructures ◽

Performance Computing

The conjugation of High Performance Computing (HPC) and Grid paradigm with applications based on commercial software is one among the major challenges of today e-Infrastructures. Several research communities from either industry or academia need to run high parallel applications based on licensed software over hundreds of CPU cores; a satisfactory fulfillment of such requests is one of the keys for the penetration of this computing paradigm into the industry world and sustainability of Grid infrastructures. This problem has been tackled in the context of the PI2S2 project that created a regional e-Infrastructure in Sicily, the first in Italy over a regional area. Present paper will describe the features added in order to integrate an HPC facility into the PI2S2 Grid infrastructure, the adoption of the InifiniBand low-latency net connection, the gLite middleware extended to support MPI/MPI2 jobs, the newly developed license server and the specific scheduling policy adopted. Moreover, it will show the results of some relevant use cases belonging to Computer Fluid-Dynamics (Fluent, OpenFOAM), Chemistry (GAMESS), Astro-Physics (Flash) and Bio-Informatics (ClustalW)).

Download Full-text

Accurate Energy and Performance Prediction for Frequency-Scaled GPU Kernels

Computation ◽

10.3390/computation8020037 ◽

2020 ◽

Vol 8 (2) ◽

pp. 37

Author(s):

Kaijie Fan ◽

Biagio Cosenza ◽

Ben Juurlink

Keyword(s):

Energy Consumption ◽

Performance Prediction ◽

High Performance ◽

Pareto Set ◽

Large Set ◽

Balance Performance ◽

Multi Objective ◽

Dynamic Voltage ◽

And Performance ◽

Performance Computing

Energy optimization is an increasingly important aspect of today’s high-performance computing applications. In particular, dynamic voltage and frequency scaling (DVFS) has become a widely adopted solution to balance performance and energy consumption, and hardware vendors provide management libraries that allow the programmer to change both memory and core frequencies manually to minimize energy consumption while maximizing performance. This article focuses on modeling the energy consumption and speedup of GPU applications while using different frequency configurations. The task is not straightforward, because of the large set of possible and uniformly distributed configurations and because of the multi-objective nature of the problem, which minimizes energy consumption and maximizes performance. This article proposes a machine learning-based method to predict the best core and memory frequency configurations on GPUs for an input OpenCL kernel. The method is based on two models for speedup and normalized energy predictions over the default frequency configuration. Those are later combined into a multi-objective approach that predicts a Pareto-set of frequency configurations. Results show that our approach is very accurate at predicting extema and the Pareto set, and finds frequency configurations that dominate the default configuration in either energy or performance.

Download Full-text

Implementation and evaluation of the HPC challenge benchmark in the XcalableMP PGAS language

The International Journal of High Performance Computing Applications ◽

10.1177/1094342017698214 ◽

2017 ◽

Vol 33 (1) ◽

pp. 110-123 ◽

Cited By ~ 5

Author(s):

Masahiro Nakao ◽

Hitoshi Murai ◽

Hidetoshi Iwashita ◽

Taisuke Boku ◽

Mitsuhisa Sato

Keyword(s):

High Performance Computing ◽

High Performance ◽

Parallel Applications ◽

Memory Model ◽

Computing Systems ◽

Local View ◽

And Performance ◽

Performance Results ◽

Performance Computing ◽

Do So

To improve productivity for developing parallel applications on high performance computing systems, the XcalableMP PGAS language has been proposed. XcalableMP supports both a typical parallelization under the “global-view memory model” which uses directives and a flexible parallelization under the “local-view memory model” which uses coarray features. The goal of the present paper is to clarify XcalableMP’s productivity and performance. To do so, we implement and evaluate the high performance computing challenge benchmark, namely, EP STREAM Triad, High Performance Linpack, Global fast Fourier transform, and RandomAccess on the K computer using up to 16,384 compute nodes and a generic cluster system using up to 128 compute nodes. We found that we could more easily implement the benchmarks using XcalableMP rather than using MPI. Moreover, most of the performance results using XcalableMP were almost the same as those using MPI.

Download Full-text

Análise de Características Comportamentais de Aplicações OpenMP para Redução do Consumo de Energia

10.5753/wperformance.2018.3346 ◽

2018 ◽

Author(s):

Gabriel B. Moro ◽

Lucas Mello Schnorr

Keyword(s):

Operating System ◽

Energy Consumption ◽

High Performance Computing ◽

High Performance ◽

Previous Analysis ◽

Computer Systems ◽

Specific Knowledge ◽

Reduce Energy Consumption ◽

Performance Computing ◽

Processor Frequency

Performance and energy consumption are fundamental requirements in computer systems. A very frequent challenge is to combine both aspects, searching to keep the high performance computing while consuming less energy. There are a lot of techniques to reduce energy consumption, but in general, they use modern processors resources or they require specific knowledge about application and platform used. In this paper, we propose a library that dynamically changes the processor frequency according to the application's computing behavior, using a previous analysis of its Memory-Bound regions. The results show a reduction of 1,89% in energy consumption for Lulesh application with an increase of 0,09% in runtime when we compare our approach against the governor Ondemand of the Linux Operating System.

Download Full-text