Reducing the energy consumption of large-scale computing systems through combined shutdown policies with multiple constraints

Large-scale distributed systems (high-performance computing centers, networks, data centers) are expected to consume huge amounts of energy. In order to address this issue, shutdown policies constitute an appealing approach able to dynamically adapt the resource set to the actual workload. However, multiple constraints have to be taken into account for such policies to be applied on real infrastructures: the time and energy cost of switching on and off, the power and energy consumption bounds caused by the electricity grid or the cooling system, and the availability of renewable energy. In this article, we propose models translating these various constraints into different shutdown policies that can be combined for a multiconstraint purpose. Our models and their combinations are validated through simulations on a real workload trace.

Download Full-text

A Hybrid Resource Reservation Method for Workflows in Clouds

International Journal of Grid and High Performance Computing ◽

10.4018/jghpc.2012100101 ◽

2012 ◽

Vol 4 (4) ◽

pp. 1-21

Author(s):

Tyng-Yeu Liang ◽

Fu-Chun Lu ◽

Jun-Yao Chiu

Keyword(s):

Cloud Computing ◽

Energy Consumption ◽

High Performance ◽

Resource Reservation ◽

Scientific Workflows ◽

Service Oriented ◽

Time And Energy ◽

Gpu Architecture ◽

Performance Computing ◽

Oriented System

QoS and energy consumption are two important issues for Cloud computing. In this paper, the authors propose a hybrid resource reservation method to address these two issues for scientific workflows in the high-performance computing Clouds built on hybrid CPU/GPU architecture. As named, this method reserves proper CPU or GPU for executing different jobs in the same workflow based on the profile of execution time and energy consumption of each resource-to-program pair. They have implemented the proposed resource reservation method on a real service-oriented system. The experimental results show that the proposed resource reservation method can effectively maintain the QoS of workflows while simultaneously minimizing the energy consumption of executing the workflows.

Download Full-text

PFASST-ER: combining the parallel full approximation scheme in space and time with parallelization across the method

Computing and Visualization in Science ◽

10.1007/s00791-020-00330-5 ◽

2020 ◽

Vol 23 (1-4) ◽

Author(s):

Ruth Schöbel ◽

Robert Speck

Keyword(s):

High Performance ◽

Large Scale ◽

Approximation Scheme ◽

Reaction Diffusion ◽

Multiple Time ◽

Time Step ◽

Computing Systems ◽

Space And Time ◽

Quasi Newton ◽

Spectral Deferred Correction

AbstractTo extend prevailing scaling limits when solving time-dependent partial differential equations, the parallel full approximation scheme in space and time (PFASST) has been shown to be a promising parallel-in-time integrator. Similar to space–time multigrid, PFASST is able to compute multiple time-steps simultaneously and is therefore in particular suitable for large-scale applications on high performance computing systems. In this work we couple PFASST with a parallel spectral deferred correction (SDC) method, forming an unprecedented doubly time-parallel integrator. While PFASST provides global, large-scale “parallelization across the step”, the inner parallel SDC method allows integrating each individual time-step “parallel across the method” using a diagonalized local Quasi-Newton solver. This new method, which we call “PFASST with Enhanced concuRrency” (PFASST-ER), therefore exposes even more temporal concurrency. For two challenging nonlinear reaction-diffusion problems, we show that PFASST-ER works more efficiently than the classical variants of PFASST and can use more processors than time-steps.

Download Full-text

Job failures in high performance computing systems: A large-scale empirical study

Computers & Mathematics with Applications ◽

10.1016/j.camwa.2011.07.040 ◽

2012 ◽

Vol 63 (2) ◽

pp. 365-377 ◽

Cited By ~ 10

Author(s):

Yulai Yuan ◽

Yongwei Wu ◽

Qiuping Wang ◽

Guangwen Yang ◽

Weimin Zheng

Keyword(s):

Empirical Study ◽

High Performance Computing ◽

High Performance ◽

Large Scale ◽

Computing Systems ◽

Performance Computing

Download Full-text

Minimizing Energy and Computation in Long-Running Software

Applied Sciences ◽

10.3390/app11031169 ◽

2021 ◽

Vol 11 (3) ◽

pp. 1169

Author(s):

Erol Gelenbe ◽

Miltiadis Siavvas

Keyword(s):

Energy Consumption ◽

Execution Time ◽

High Performance ◽

Average Energy ◽

Computation Time ◽

Hardware Failures ◽

Average Energy Consumption ◽

The One ◽

Hardware Platforms ◽

Time And Energy

Long-running software may operate on hardware platforms with limited energy resources such as batteries or photovoltaic, or on high-performance platforms that consume a large amount of energy. Since such systems may be subject to hardware failures, checkpointing is often used to assure the reliability of the application. Since checkpointing introduces additional computation time and energy consumption, we study how checkpoint intervals need to be selected so as to minimize a cost function that includes the execution time and the energy. Expressions for both the program’s energy consumption and execution time are derived as a function of the failure probability per instruction. A first principle based analysis yields the checkpoint interval that minimizes a linear combination of the average energy consumption and execution time of the program, in terms of the classical “Lambert function”. The sensitivity of the checkpoint to the importance attributed to energy consumption is also derived. The results are illustrated with numerical examples regarding programs of various lengths and showing the relation between the checkpoint interval that minimizes energy consumption and execution time, and the one that minimizes a weighted sum of the two. In addition, our results are applied to a popular software benchmark, and posted on a publicly accessible web site, together with the optimization software that we have developed.

Download Full-text

Autonomic Runtime Adaptation Framework for Power Management in Large-Scale High-Performance Computing Systems

2020 IEEE 17th India Council International Conference (INDICON) ◽

10.1109/indicon49873.2020.9342528 ◽

2020 ◽

Author(s):

Sumit Kumar Saurav ◽

S Bindhumadhva Bapu

Keyword(s):

High Performance Computing ◽

Power Management ◽

High Performance ◽

Large Scale ◽

Computing Systems ◽

Runtime Adaptation ◽

Performance Computing

Download Full-text

Balancing Energy and Performance in Dense Linear System Solvers for Hybrid ARM+GPU platforms

CLEI electronic journal ◽

10.19153/cleiej.19.1.2 ◽

2016 ◽

Author(s):

Juan P. Silva ◽

Ernesto Dufrechou ◽

Pabl Ezzatti ◽

Enrique S. Quintana-Ortí ◽

Alfredo Remón ◽

...

Keyword(s):

Energy Consumption ◽

Linear System ◽

Energy Efficient ◽

High Performance ◽

Energy Aware ◽

Balancing Energy ◽

Dense Linear System ◽

And Performance ◽

Hardware Platforms ◽

Time And Energy

The high performance computing community has traditionally focused uniquely on the reduction of execution time, though in the last years, the optimization of energy consumption has become a main issue. A reduction of energy usage without a degradation of performance requires the adoption of energy-efficient hardware platforms accompanied by the development of energy-aware algorithms and computational kernels. The solution of linear systems is a key operation for many scientific and engineering problems. Its relevance has motivated an important amount of work, and consequently, it is possible to find high performance solvers for a wide variety of hardware platforms. In this work, we aim to develop a high performance and energy-efficient linear system solver. In particular, we develop two solvers for a low-power CPU-GPU platform, the NVIDIA Jetson TK1. These solvers implement the Gauss-Huard algorithm yielding an efficient usage of the target hardware as well as an efficient memory access. The experimental evaluation shows that the novel proposal reports important savings in both time and energy-consumption when compared with the state-of-the-art solvers of the platform.

Download Full-text

A Large-Scale Study of Failures in High-Performance Computing Systems

IEEE Transactions on Dependable and Secure Computing ◽

10.1109/tdsc.2009.4 ◽

2010 ◽

Vol 7 (4) ◽

pp. 337-350 ◽

Cited By ~ 278

Author(s):

Bianca Schroeder ◽

Garth A. Gibson

Keyword(s):

High Performance Computing ◽

High Performance ◽

Large Scale ◽

Computing Systems ◽

Large Scale Study ◽

Performance Computing

Download Full-text

Exploiting P2P and Grid Computing Technologies for Resource Sharing to Support High Performance Distributed System

Handbook of Research on P2P and Grid Systems for Service-Oriented Computing ◽

10.4018/978-1-61520-686-5.ch019 ◽

2010 ◽

pp. 450-475

Author(s):

Liangxiu Han

Keyword(s):

Resource Sharing ◽

High Performance ◽

Large Scale ◽

Fault Tolerant ◽

Resource Discovery ◽

Distributed Computing Systems ◽

Computing Systems ◽

Efficient Resource ◽

Service Oriented ◽

Important Design

This chapter identifies challenges and requirements for resource sharing to support high performance distributed Service-Oriented Computing (SOC) systems. The chapter draws attention to two popular and important design paradigms: Grid and Peer-to-Peer (P2P) computing systems, which are evolving as two practical solutions to supporting wide-area resource sharing over the Internet. As a fundamental task of resource sharing, the efficient resource discovery is playing an important role in the context of the SOC setting. The chapter presents the resource discovery in Grid and P2P environments through an overview of related systems, both historical and emerging. The chapter then discusses the exploitation of both technologies for facilitating the resource discovery within large-scale distributed computing systems in a flexible, scalable, fault-tolerant, interoperable and security fashion.

Download Full-text

A Comprehensive Study on Commercial Applications of Cloud Computing

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.9088 ◽

2020 ◽

Vol 17 (9) ◽

pp. 4411-4418

Author(s):

S. Jagannatha ◽

B. N. Tulasimala

Keyword(s):

Cloud Computing ◽

Data Storage ◽

High Performance ◽

Large Scale ◽

Service Providers ◽

Cost Effective ◽

Computer Hardware ◽

Computing Technology ◽

Computing Systems ◽

Computational Performance

In the world of information communication technology (ICT) the term Cloud Computing has been the buzz word. Cloud computing is changing its definition the way technocrats are using it according to the environment. Cloud computing as a definition remains very contentious. Definition is stated liable to a particular application with no unanimous definition, making it altogether elusive. In spite of this, it is this technology which is revolutionizing the traditional usage of computer hardware, software, data storage media, processing mechanism with more of benefits to the stake holders. In the past, the use of autonomous computers and the nodes that were interconnected forming the computer networks with shared software resources had minimized the cost on hardware and also on the software to certain extent. Thus evolutionary changes in computing technology over a few decades has brought in the platform and environment changes in machine architecture, operating system, network connectivity and application workload. This has made the commercial use of technology more predominant. Instead of centralized systems, parallel and distributed systems will be more preferred to solve computational problems in the business domain. These hardware are ideal to solve large-scale problems over internet. This computing model is data-intensive and networkcentric. Most of the organizations with ICT used to feel storing of huge data, maintaining, processing of the same and communication through internet for automating the entire process a challenge. In this paper we explore the growth of CC technology over several years. How high performance computing systems and high throughput computing systems enhance computational performance and also how cloud computing technology according to various experts, scientific community and also the service providers is going to be more cost effective through different dimensions of business aspects.

Download Full-text

Hierarchical algorithms on hierarchical architectures

Philosophical Transactions of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rsta.2019.0055 ◽

2020 ◽

Vol 378 (2166) ◽

pp. 20190055 ◽

Cited By ~ 4

Author(s):

D. E. Keyes ◽

H. Ltaief ◽

G. Turkiyyah

Keyword(s):

High Performance ◽

Large Scale ◽

Fast Multipole Method ◽

Numerical Algorithms ◽

Building Blocks ◽

Linear Operators ◽

Low Rank ◽

Processing Power ◽

Storage Complexity ◽

Time And Energy

A traditional goal of algorithmic optimality, squeezing out flops, has been superseded by evolution in architecture. Flops no longer serve as a reasonable proxy for all aspects of complexity. Instead, algorithms must now squeeze memory, data transfers, and synchronizations, while extra flops on locally cached data represent only small costs in time and energy. Hierarchically low-rank matrices realize a rarely achieved combination of optimal storage complexity and high-computational intensity for a wide class of formally dense linear operators that arise in applications for which exascale computers are being constructed. They may be regarded as algebraic generalizations of the fast multipole method. Methods based on these hierarchical data structures and their simpler cousins, tile low-rank matrices, are well proportioned for early exascale computer architectures, which are provisioned for high processing power relative to memory capacity and memory bandwidth. They are ushering in a renaissance of computational linear algebra. A challenge is that emerging hardware architecture possesses hierarchies of its own that do not generally align with those of the algorithm. We describe modules of a software toolkit, hierarchical computations on manycore architectures, that illustrate these features and are intended as building blocks of applications, such as matrix-free higher-order methods in optimization and large-scale spatial statistics. Some modules of this open-source project have been adopted in the software libraries of major vendors. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.

Download Full-text