Advance bandwidth reservation for energy efficiency in high-performance networks

Rising power costs and constraints are driving a growing focus on the energy efficiency of high performance computing systems. The unique characteristics of a particular system and workload and their effect on performance and energy efficiency are typically difficult for application users to assess and to control. Settings for optimum performance and energy efficiency can also diverge, so we need to identify trade-off options that guide a suitable balance between energy use and performance. We present statistical and machine learning models that only require a small number of runs to make accurate Pareto-optimal trade-off predictions using parameters that users can control. We study model training and validation using several parallel kernels and more complex workloads, including Algebraic Multigrid (AMG), Large-scale Atomic Molecular Massively Parallel Simulator, and Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics. We demonstrate that we can train the models using as few as 12 runs, with prediction error of less than 10%. Our AMG results identify trade-off options that provide up to 45% improvement in energy efficiency for around 10% performance loss. We reduce the sample measurement time required for AMG by 90%, from 13 h to 74 min.

Download Full-text

Overview of typical application energy efficiency optimization in high-performance data centers

2021 IEEE International Conference on Power Electronics, Computer Applications (ICPECA) ◽

10.1109/icpeca51329.2021.9362524 ◽

2021 ◽

Author(s):

Weidong Wu ◽

Haiyang Chen ◽

Kuanhong Li ◽

Jun Yu

Keyword(s):

Energy Efficiency ◽

High Performance ◽

Data Centers ◽

Performance Data ◽

Typical Application ◽

Efficiency Optimization

Download Full-text

Towards New Energy Efficiency Limits of High Performance Clusters

Proceedings of the ITI 2013 35th International Conference on INFORMATION TECHNOLOGY INTERFACES ◽

10.2498/iti.2013.0582 ◽

2013 ◽

Author(s):

Kruno Golubi�

Keyword(s):

Energy Efficiency ◽

High Performance ◽

New Energy

Download Full-text

Profiling high performance dense linear algebra algorithms on multicore architectures for power and energy efficiency

Computer Science - Research and Development ◽

10.1007/s00450-011-0191-z ◽

2011 ◽

Vol 27 (4) ◽

pp. 277-287 ◽

Cited By ~ 17

Author(s):

Hatem Ltaief ◽

Piotr Luszczek ◽

Jack Dongarra

Keyword(s):

Energy Efficiency ◽

Linear Algebra ◽

High Performance ◽

Multicore Architectures ◽

Dense Linear Algebra ◽

Power And Energy

Download Full-text

A cluster-scalable VLIW cryptography processor with high performance and energy efficiency

2017 IEEE 12th International Conference on ASIC (ASICON) ◽

10.1109/asicon.2017.8252501 ◽

2017 ◽

Author(s):

Wei Huang ◽

Zhonghe Guo ◽

Xiaohua Song ◽

Fei Sun

Keyword(s):

Energy Efficiency ◽

High Performance

Download Full-text

Measuring and tuning energy efficiency on large scale high performance computing platforms.

10.2172/1035312 ◽

2011 ◽

Cited By ~ 1

Author(s):

James H., III Laros

Keyword(s):

Energy Efficiency ◽

High Performance Computing ◽

High Performance ◽

Large Scale ◽

Computing Platforms ◽

Performance Computing

Download Full-text

Aggressive and reliable high-performance architectures - techniques for thermal control, energy efficiency, and performance augmentation

10.31274/etd-180810-2903 ◽

2011 ◽

Author(s):

Prem Kumar Ramesh

Keyword(s):

Energy Efficiency ◽

High Performance ◽

Thermal Control ◽

And Performance

Download Full-text

NUMA-Aware DGEMM Based on 64-Bit ARMv8 Multicore Processors Architecture

Electronics ◽

10.3390/electronics10161984 ◽

2021 ◽

Vol 10 (16) ◽

pp. 1984

Author(s):

Wei Zhang ◽

Zihao Jiang ◽

Zhiguang Chen ◽

Nong Xiao ◽

Yang Ou

Keyword(s):

Energy Efficiency ◽

High Performance ◽

Multicore Processors ◽

Matrix Multiplication ◽

Memory Access ◽

Double Precision ◽

Competitive Performance ◽

General Matrix ◽

Remarkable Improvement ◽

Task Independence

Double-precision general matrix multiplication (DGEMM) is an essential kernel for measuring the potential performance of an HPC platform. ARMv8-based system-on-chips (SoCs) have become the candidates for the next-generation HPC systems with their highly competitive performance and energy efficiency. Therefore, it is meaningful to design high-performance DGEMM for ARMv8-based SoCs. However, as ARMv8-based SoCs integrate increasing cores, modern CPU uses non-uniform memory access (NUMA). NUMA restricts the performance and scalability of DGEMM when many threads access remote NUMA domains. This poses a challenge to develop high-performance DGEMM on multi-NUMA architecture. We present a NUMA-aware method to reduce the number of cross-die and cross-chip memory access events. The critical enabler for NUMA-aware DGEMM is to leverage two levels of parallelism between and within nodes in a purely threaded implementation, which allows the task independence and data localization of NUMA nodes. We have implemented NUMA-aware DGEMM in the OpenBLAS and evaluated it on a dual-socket server with 48-core processors based on the Kunpeng920 architecture. The results show that NUMA-aware DGEMM has effectively reduced the number of cross-die and cross-chip memory access, resulting in enhancing the scalability of DGEMM significantly and increasing the performance of DGEMM by 17.1% on average, with the most remarkable improvement being 21.9%.

Download Full-text

A Trade-off between Energy Efficiency and High-Performance in Routing for Mobile Ad hoc Networks

Journal of Communications ◽

10.12720/jcm.15.3.263-269 ◽

2020 ◽

pp. 263-269

Author(s):

Vu Khanh Quy ◽

◽

Le Ngoc Hung

Keyword(s):

Energy Efficiency ◽

Ad Hoc Networks ◽

Mobile Ad Hoc Networks ◽

High Performance ◽

Ad Hoc ◽

Trade Off ◽

Mobile Ad Hoc ◽

Hoc Networks

Download Full-text

An Interpretable Machine Learning Model Enhanced Integrated CPU-GPU DVFS Governor

ACM Transactions on Embedded Computing Systems ◽

10.1145/3470974 ◽

2021 ◽

Vol 20 (6) ◽

pp. 1-28

Author(s):

Jurn-Gyu Park ◽

Nikil Dutt ◽

Sung-Soo Lim

Keyword(s):

Machine Learning ◽

Energy Efficiency ◽

High Performance ◽

Linear Models ◽

Piecewise Linear ◽

Prediction Errors ◽

Linear Regression Models ◽

Mobile Games ◽

Interpretable Machine Learning ◽

Mathematical Formulas

Modern heterogeneous CPU-GPU-based mobile architectures, which execute intensive mobile gaming/graphics applications, use software governors to achieve high performance with energy-efficiency. However, existing governors typically utilize simple statistical or heuristic models, assuming linear relationships using a small unbalanced dataset of mobile games; and the limitations result in high prediction errors for dynamic and diverse gaming workloads on heterogeneous platforms. To overcome these limitations, we propose an interpretable machine learning (ML) model enhanced integrated CPU-GPU governor: (1) It builds tree-based piecewise linear models (i.e., model trees) offline considering both high accuracy (low error) and interpretable ML models based on mathematical formulas using a simulatability operation counts quantitative metric. And then (2) it deploys the selected models for online estimation into an integrated CPU-GPU Dynamic Voltage Frequency Scaling governor. Our experiments on a test set of 20 mobile games exhibiting diverse characteristics show that our governor achieved significant energy efficiency gains of over 10% (up to 38%) improvements on average in energy-per-frame with a surprising-but-modest 3% improvement in Frames-per-Second performance, compared to a typical state-of-the-art governor that employs simple linear regression models.

Download Full-text

Advance bandwidth reservation for energy efficiency in high-performance networks

Statistical and machine learning models for optimizing energy in parallel applications

Overview of typical application energy efficiency optimization in high-performance data centers

Towards New Energy Efficiency Limits of High Performance Clusters

Profiling high performance dense linear algebra algorithms on multicore architectures for power and energy efficiency

A cluster-scalable VLIW cryptography processor with high performance and energy efficiency

Measuring and tuning energy efficiency on large scale high performance computing platforms.

Aggressive and reliable high-performance architectures - techniques for thermal control, energy efficiency, and performance augmentation

NUMA-Aware DGEMM Based on 64-Bit ARMv8 Multicore Processors Architecture

A Trade-off between Energy Efficiency and High-Performance in Routing for Mobile Ad hoc Networks

An Interpretable Machine Learning Model Enhanced Integrated CPU-GPU DVFS Governor

Export Citation Format