Balancing Energy and Performance in Dense Linear System Solvers for Hybrid ARM+GPU platforms

The high performance computing community has traditionally focused uniquely on the reduction of execution time, though in the last years, the optimization of energy consumption has become a main issue. A reduction of energy usage without a degradation of performance requires the adoption of energy-efficient hardware platforms accompanied by the development of energy-aware algorithms and computational kernels. The solution of linear systems is a key operation for many scientific and engineering problems. Its relevance has motivated an important amount of work, and consequently, it is possible to find high performance solvers for a wide variety of hardware platforms. In this work, we aim to develop a high performance and energy-efficient linear system solver. In particular, we develop two solvers for a low-power CPU-GPU platform, the NVIDIA Jetson TK1. These solvers implement the Gauss-Huard algorithm yielding an efficient usage of the target hardware as well as an efficient memory access. The experimental evaluation shows that the novel proposal reports important savings in both time and energy-consumption when compared with the state-of-the-art solvers of the platform.

Download Full-text

Minimizing Energy and Computation in Long-Running Software

Applied Sciences ◽

10.3390/app11031169 ◽

2021 ◽

Vol 11 (3) ◽

pp. 1169

Author(s):

Erol Gelenbe ◽

Miltiadis Siavvas

Keyword(s):

Energy Consumption ◽

Execution Time ◽

High Performance ◽

Average Energy ◽

Computation Time ◽

Hardware Failures ◽

Average Energy Consumption ◽

The One ◽

Hardware Platforms ◽

Time And Energy

Long-running software may operate on hardware platforms with limited energy resources such as batteries or photovoltaic, or on high-performance platforms that consume a large amount of energy. Since such systems may be subject to hardware failures, checkpointing is often used to assure the reliability of the application. Since checkpointing introduces additional computation time and energy consumption, we study how checkpoint intervals need to be selected so as to minimize a cost function that includes the execution time and the energy. Expressions for both the program’s energy consumption and execution time are derived as a function of the failure probability per instruction. A first principle based analysis yields the checkpoint interval that minimizes a linear combination of the average energy consumption and execution time of the program, in terms of the classical “Lambert function”. The sensitivity of the checkpoint to the importance attributed to energy consumption is also derived. The results are illustrated with numerical examples regarding programs of various lengths and showing the relation between the checkpoint interval that minimizes energy consumption and execution time, and the one that minimizes a weighted sum of the two. In addition, our results are applied to a popular software benchmark, and posted on a publicly accessible web site, together with the optimization software that we have developed.

Download Full-text

E-BaTS: Energy-Aware Scheduling for Bag-of-Task Applications in HPC Clusters

Parallel Processing Letters ◽

10.1142/s0129626415410054 ◽

2015 ◽

Vol 25 (03) ◽

pp. 1541005

Author(s):

Alexandra Vintila Filip ◽

Ana-Maria Oprescu ◽

Stefania Costache ◽

Thilo Kielmann

Keyword(s):

Energy Consumption ◽

High Performance Computing ◽

High Performance ◽

Terms Of Trade ◽

Exhaustive Search ◽

Energy Aware ◽

Trade Offs ◽

Energy Aware Scheduling ◽

And Performance ◽

Performance Computing

High-Performance Computing (HPC) systems consume large amounts of energy. As the energy consumption predictions for HPC show increasing numbers, it is important to make users aware of the energy spent for the execution of their applications. Drawing from our experience with exposing cost and performance in public clouds, in this paper we present a generic mechanism to compute fast and accurate estimates for the tradeoffs between the performance (expressed as makespan) and the energy consumption of applications running on HPC clusters. We validate our approach by implementing it in a prototype, called E-BaTS and validating it with a wide variety of HPC bags-of-tasks. Our experiments show that E-BaTS produces conservative estimates with errors below 5%, while requiring at most 12% of the energy and time of an exhaustive search for providing configurations close to the optimal ones in terms of trade-offs between energy consumption and makespan.

Download Full-text

A Hybrid Resource Reservation Method for Workflows in Clouds

International Journal of Grid and High Performance Computing ◽

10.4018/jghpc.2012100101 ◽

2012 ◽

Vol 4 (4) ◽

pp. 1-21

Author(s):

Tyng-Yeu Liang ◽

Fu-Chun Lu ◽

Jun-Yao Chiu

Keyword(s):

Cloud Computing ◽

Energy Consumption ◽

High Performance ◽

Resource Reservation ◽

Scientific Workflows ◽

Service Oriented ◽

Time And Energy ◽

Gpu Architecture ◽

Performance Computing ◽

Oriented System

QoS and energy consumption are two important issues for Cloud computing. In this paper, the authors propose a hybrid resource reservation method to address these two issues for scientific workflows in the high-performance computing Clouds built on hybrid CPU/GPU architecture. As named, this method reserves proper CPU or GPU for executing different jobs in the same workflow based on the profile of execution time and energy consumption of each resource-to-program pair. They have implemented the proposed resource reservation method on a real service-oriented system. The experimental results show that the proposed resource reservation method can effectively maintain the QoS of workflows while simultaneously minimizing the energy consumption of executing the workflows.

Download Full-text

New Algorithms for Balancing Energy Consumption and Performance in Computational Clusters

Computing and Informatics ◽

10.4149/cai_2017_2_307 ◽

2017 ◽

Vol 36 (2) ◽

pp. 307-330

Author(s):

Xuan Thi Tran ◽

Tien Van Do ◽

Binh Thai Vu

Keyword(s):

Energy Consumption ◽

Balancing Energy ◽

And Performance ◽

New Algorithms

Download Full-text

Evaluation of a Cold-Mixed High-Performance Polyurethane Mixture

Advances in Materials Science and Engineering ◽

10.1155/2019/1507971 ◽

2019 ◽

Vol 2019 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Sun Min ◽

Yufeng Bi ◽

Mulian Zheng ◽

Sai Chen ◽

Jingjing Li

Keyword(s):

Energy Consumption ◽

High Performance ◽

Greenhouse Gas Emission ◽

Asphalt Mixture ◽

Temperature Stability ◽

Water Stability ◽

High Temperature Stability ◽

Forming Mechanism ◽

And Performance

The energy consumption and greenhouse gas emission of asphalt pavement have become a very serious global problem. The high-temperature stability and durability of polyurethane (PU) are very good. It is studied as an alternative binder for asphalt recently. However, the strength-forming mechanism and the mixture structure of the PU mixture are different from the asphalt mixture. This work explored the design and performance evaluation of the PU mixture. The PU content of mixtures was determined by the creep slope (K), tensile strength ratios (TSR), immersion Cantabro loss (ICL), and the volume of air voids (VV) to ensure better water stability. The high- and low-temperature stability, water stability, dynamic mechanical property, and sustainability of the PU mixture were evaluated and compared with those of the stone matrix asphalt mixture (SMA). The test results showed that the dynamic stability and bending strain of the PU mixture were about 7.5 and 2.3 times of SMA. The adhesion level of PU and the basalt aggregate was one level greater than the limestone, and basalt aggregates were proposed to use in the PU mixture to improve water stability. Although the initial TSR and ICL of PU mixture were lower, the long-term values were higher; the PUM had better long-term water damage resistance. The dynamic modulus and phase angles (φ) of the PU mixture were much higher. The energy consumption and CO2 emission of the PU mixture were lower than those of SMA. Therefore, the cold-mixed PU mixture is a sustainable material with excellent performance and can be used as a substitute for asphalt mixture.

Download Full-text

Approach towards an energy-aware and energy-efficient high performance computing environment

2011 IEEE 7th International Conference on Intelligent Computer Communication and Processing ◽

10.1109/iccp.2011.6047921 ◽

2011 ◽

Cited By ~ 2

Author(s):

Alexander Kipp ◽

Jia Liu ◽

Tao Jiang ◽

Dmitry Khabi ◽

Yevgeniya Kovalenko ◽

...

Keyword(s):

High Performance Computing ◽

Energy Efficient ◽

High Performance ◽

Computing Environment ◽

Energy Aware ◽

Performance Computing

Download Full-text

Moving Multiscale Modelling to the Edge: Benchmarking and Load Optimization for Cellular Automata on Low Power Microcomputers

Processes ◽

10.3390/pr9122225 ◽

2021 ◽

Vol 9 (12) ◽

pp. 2225

Author(s):

Piotr Hajder ◽

Łukasz Rauch

Keyword(s):

Energy Consumption ◽

High Performance ◽

Computation Time ◽

Computing Power ◽

Load Optimization ◽

Multi Scale ◽

Scale Modelling ◽

Similar Computation ◽

And Performance ◽

Two Parameters

Numerical computations are usually associated with the High Performance Computing. Nevertheless, both industry and science tend to involve devices with lower power in computations. This is especially true when the data collecting devices are able to partially process them at place, thus increasing the system reliability. This paradigm is known as Edge Computing. In this paper, we propose the use of devices at the edge, with lower computing power, for multi-scale modelling calculations. A system was created, consisting of a high-power device—a two-processor workstation, 8 RaspberryPi 4B microcomputers and 8 NVidia Jetson Nano units, equipped with GPU processor. As a part of this research, benchmarking was performed, on the basis of which the computational capabilities of the devices were classified. Two parameters were considered: the number and performance of computing units (CPUs and GPUs) and the energy consumption of the loaded machines. Then, using the calculated weak scalability and energy consumption, a min–max-based load optimization algorithm was proposed. The system was tested in laboratory conditions, giving similar computation time with same power consumption for 24 physical workstation cores vs. 8x RaspberryPi 4B and 8x Jetson Nano. The work ends with a proposal to use this solution in industrial processes on example of hot rolling of flat products.

Download Full-text

Energy-aware strategy for data forwarding in IoT ecosystem

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v10i5.pp4863-4871 ◽

2020 ◽

Vol 10 (5) ◽

pp. 4863 ◽

Cited By ~ 1

Author(s):

K. Nagarathna

Keyword(s):

Energy Consumption ◽

Energy Efficient ◽

Selection Process ◽

Path Selection ◽

Destination Node ◽

Data Forwarding ◽

Energy Aware ◽

Energy Efficient Routing ◽

Research Attention ◽

Iot Devices

The Internet of Things (IoT) is looming technology rapidly attracting many industries and drawing research attention. Although the scale of IoT-applications is very large, the capabilities of the IoT-devices are limited, especially in terms of energy. However, various research works have been done to alleviate these shortcomings, but the schemes introduced in the literature are complex and difficult to implement in practical scenarios. Therefore, considering the energy consumption of heterogeneous nodes in IoT eco-system, a simple energy-efficient routing technique is proposed. The proposed system has also employed an SDN controller that acts as a centralized manager to control and monitor network services, there by restricting the access of selfish nodes to the network. The proposed system constructs an analytical algorithm that provides reliable data transmission operations and controls energy consumption using a strategic mechanism where the path selection process is performed based on the remaining energy of adjacent nodes located in the direction of the destination node. The proposed energy-efficient data forwarding mechanism is compared with the existing AODV routing technique. The simulation result demonstrates that the protocol is superior to AODV in terms of packet delivery rate, throughput, and end-to-end delay.

Download Full-text

Exploiting Memory Resilience for Emerging Technologies: An Energy-Aware Resilience Exemplar for STT-RAM Memories

Dependable Embedded Systems - Embedded Systems ◽

10.1007/978-3-030-52017-5_21 ◽

2020 ◽

pp. 505-526

Author(s):

Amir Mahdi Hosseini Monazzah ◽

Amir M. Rahmani ◽

Antonio Miele ◽

Nikil Dutt

Keyword(s):

Energy Consumption ◽

Error Rate ◽

Spin Transfer Torque ◽

Error Correction Codes ◽

Energy Aware ◽

Trade Off ◽

Protection Scheme ◽

And Performance ◽

On Chip ◽

Rate Threshold

AbstractDue to the consistent pressing quest of larger on-chip memories and caches of multicore and manycore architectures, Spin Transfer Torque Magnetic RAM (STT-MRAM or STT-RAM) has been proposed as a promising technology to replace classical SRAMs in near-future devices. Main advantages of STT-RAMs are a considerably higher transistor density and a negligible leakage power compared with SRAM technology. However, the drawback of this technology is the high probability of errors occurring especially in write operations. Such errors are asymmetric and transition-dependent, where 0 → 1 is the most critical one, and is high subjected to the amount and current (voltage) supplied to the memory during the write operation. As a consequence, STT-RAMs present an intrinsic trade-off between energy consumption vs. reliability that needs to be properly tuned w.r.t. the currently running application and its reliability requirement. This chapter proposes FlexRel, an energy-aware reliability improvement architectural scheme for STT-RAM cache memories. FlexRel considers a memory architecture provided with Error Correction Codes (ECCs) and a custom current regulator for the various cache ways and conducts a trade-off between reliability and energy consumption. FlexRel cache controller dynamically profiles the number of 0 → 1 transitions of each individual bit write operation in a cache block and based on that selects the most-suitable cache way and current level to guarantee the necessary error rate threshold (in terms of occurred write errors) while minimizing the energy consumption. We experimentally evaluated the efficiency of FlexRel against the most efficient uniform protection scheme from reliability, energy, area, and performance perspectives. Experimental simulations performed by using gem5 has demonstrated that while FlexRel satisfies the given error rate threshold, it delivers up to 13.2% energy saving. From the area footprint perspective, FlexRel delivers up to 7.9% cache ways’ area saving. Furthermore, the performance overhead of the FlexRel algorithm which changes the traffic patterns of the cache ways during the executions is 1.7%, on average.

Download Full-text

FLASH: F ast Neura l A rchitecture S earch with H ardware Optimization

ACM Transactions on Embedded Computing Systems ◽

10.1145/3476994 ◽

2021 ◽

Vol 20 (5s) ◽

pp. 1-26

Author(s):

Guihong Li ◽

Sumit K. Mandal ◽

Umit Y. Ogras ◽

Radu Marculescu

Keyword(s):

Energy Consumption ◽

High Performance ◽

Search Space ◽

Analytical Models ◽

Raspberry Pi ◽

Simplicial Homology ◽

Theoretical Contribution ◽

Standard Ml ◽

Performance Requirements ◽

And Performance

Neural architecture search (NAS) is a promising technique to design efficient and high-performance deep neural networks (DNNs). As the performance requirements of ML applications grow continuously, the hardware accelerators start playing a central role in DNN design. This trend makes NAS even more complicated and time-consuming for most real applications. This paper proposes FLASH, a very fast NAS methodology that co-optimizes the DNN accuracy and performance on a real hardware platform. As the main theoretical contribution, we first propose the NN-Degree, an analytical metric to quantify the topological characteristics of DNNs with skip connections (e.g., DenseNets, ResNets, Wide-ResNets, and MobileNets). The newly proposed NN-Degree allows us to do training-free NAS within one second and build an accuracy predictor by training as few as 25 samples out of a vast search space with more than 63 billion configurations. Second, by performing inference on the target hardware, we fine-tune and validate our analytical models to estimate the latency, area, and energy consumption of various DNN architectures while executing standard ML datasets. Third, we construct a hierarchical algorithm based on simplicial homology global optimization (SHGO) to optimize the model-architecture co-design process, while considering the area, latency, and energy consumption of the target hardware. We demonstrate that, compared to the state-of-the-art NAS approaches, our proposed hierarchical SHGO-based algorithm enables more than four orders of magnitude speedup (specifically, the execution time of the proposed algorithm is about 0.1 seconds). Finally, our experimental evaluations show that FLASH is easily transferable to different hardware architectures, thus enabling us to do NAS on a Raspberry Pi-3B processor in less than 3 seconds.

Download Full-text