Energy-efficient algebra kernels in FPGA for High Performance Computing

The dissemination of multi-core architectures and the later irruption of massively parallel devices, led to a revolution in High-Performance Computing (HPC) platforms in the last decades. As a result, Field-Programmable Gate Arrays (FPGAs) are re-emerging as a versatile and more energy-efficient alternative to other platforms. Traditional FPGA design implies using low-level Hardware Description Languages (HDL) such as VHDL or Verilog, which follow an entirely different programming model than standard software languages, and their use requires specialized knowledge of the underlying hardware. In the last years, manufacturers started to make big efforts to provide High-Level Synthesis (HLS) tools, in order to allow a grater adoption of FPGAs in the HPC community.Our work studies the use of multi-core hardware and different FPGAs to address Numerical Linear Algebra (NLA) kernels such as the general matrix multiplication GEMM and the sparse matrix-vector multiplication SpMV. Specifically, we compare the behavior of fine-tuned kernels in a multi-core CPU processor and HLS implementations on FPGAs. We perform the experimental evaluation of our implementations on a low-end and a cutting-edge FPGA platform, in terms of runtime and energy consumption, and compare the results against the Intel MKL library in CPU.

Download Full-text

A Novel Energy Efficient Scheduling for High Performance Computing Systems

2018 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT) ◽

10.1109/icccnt.2018.8494120 ◽

2018 ◽

Cited By ~ 4

Author(s):

Tarun Biswas ◽

Pratyay Kuila ◽

Anjan Kumar Ray

Keyword(s):

High Performance Computing ◽

Energy Efficient ◽

High Performance ◽

Computing Systems ◽

Performance Computing ◽

Energy Efficient Scheduling

Download Full-text

Application-based fault tolerance techniques for sparse matrix solvers

The International Journal of High Performance Computing Applications ◽

10.1177/1094342017694946 ◽

2017 ◽

Vol 32 (5) ◽

pp. 627-640

Author(s):

Simon McIntosh–Smith ◽

Rob Hunt ◽

James Price ◽

Alex Warwick Vesztrocy

Keyword(s):

Fault Tolerance ◽

High Performance Computing ◽

High Performance ◽

Sparse Matrix ◽

Sparse Matrices ◽

Error Correcting Codes ◽

Computing Systems ◽

Hardware Costs ◽

Extreme Scale ◽

Performance Computing

High-performance computing systems continue to increase in size in the quest for ever higher performance. The resulting increased electronic component count, coupled with the decrease in feature sizes of the silicon manufacturing processes used to build these components, may result in future exascale systems being more susceptible to soft errors caused by cosmic radiation than in current high-performance computing systems. Through the use of techniques such as hardware-based error-correcting codes and checkpoint-restart, many of these faults can be mitigated at the cost of increased hardware overhead, run-time, and energy consumption that can be as much as 10–20%. Some predictions expect these overheads to continue to grow over time. For extreme scale systems, these overheads will represent megawatts of power consumption and millions of dollars of additional hardware costs, which could potentially be avoided with more sophisticated fault-tolerance techniques. In this paper we present new software-based fault tolerance techniques that can be applied to one of the most important classes of software in high-performance computing: iterative sparse matrix solvers. Our new techniques enables us to exploit knowledge of the structure of sparse matrices in such a way as to improve the performance, energy efficiency, and fault tolerance of the overall solution.

Download Full-text

EE HPC SOP 2020 Energy Efficient High Performance Computing State of the Practice Workshop : Welcome Message

2020 IEEE International Conference on Cluster Computing (CLUSTER) ◽

10.1109/cluster49012.2020.00008 ◽

2020 ◽

Keyword(s):

High Performance Computing ◽

Energy Efficient ◽

High Performance ◽

Performance Computing

Download Full-text

Approach towards an energy-aware and energy-efficient high performance computing environment

2011 IEEE 7th International Conference on Intelligent Computer Communication and Processing ◽

10.1109/iccp.2011.6047921 ◽

2011 ◽

Cited By ~ 2

Author(s):

Alexander Kipp ◽

Jia Liu ◽

Tao Jiang ◽

Dmitry Khabi ◽

Yevgeniya Kovalenko ◽

...

Keyword(s):

High Performance Computing ◽

Energy Efficient ◽

High Performance ◽

Computing Environment ◽

Energy Aware ◽

Performance Computing

Download Full-text

DCAF - A Directly Connected Arbitration-Free Photonic Crossbar for Energy-Efficient High Performance Computing

2012 IEEE 26th International Parallel and Distributed Processing Symposium ◽

10.1109/ipdps.2012.105 ◽

2012 ◽

Cited By ~ 6

Author(s):

Christopher Nitta ◽

Matthew Farrens ◽

Venkatesh Akella

Keyword(s):

High Performance Computing ◽

Energy Efficient ◽

High Performance ◽

Performance Computing

Download Full-text

Power/performance evaluation of energy efficient Ethernet (EEE) for High Performance Computing

2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) ◽

10.1109/ispass.2013.6557171 ◽

2013 ◽

Cited By ~ 19

Author(s):

Karthikeyan P. Saravanan ◽

Paul M. Carpenter ◽

Alex Ramirez

Keyword(s):

Performance Evaluation ◽

High Performance Computing ◽

Energy Efficient ◽

High Performance ◽

Power Performance ◽

Energy Efficient Ethernet ◽

Performance Computing

Download Full-text

2015 Workshop on Exploiting Silicon Photonics for Energy-Efficient High Performance Computing

10.1109/siphotonics35736.2015 ◽

2015 ◽

Keyword(s):

High Performance Computing ◽

Silicon Photonics ◽

Energy Efficient ◽

High Performance ◽

Performance Computing

Download Full-text

The ESCAPE project: Energy-efficient Scalable Algorithms for Weather Prediction at Exascale

10.5194/gmd-2018-304 ◽

2019 ◽

Cited By ~ 1

Author(s):

Andreas Müller ◽

Willem Deconinck ◽

Christian Kühnlein ◽

Gianmarco Mengaldo ◽

Michael Lange ◽

...

Keyword(s):

Energy Efficiency ◽

High Performance Computing ◽

Energy Efficient ◽

High Performance ◽

Prediction Models ◽

Weather Prediction ◽

Building Blocks ◽

Scalable Algorithms ◽

Weather And Climate ◽

Performance Computing

Abstract. In the simulation of complex multi-scale flow problems, such as those arising in weather and climate modelling, one of the biggest challenges is to satisfy operational requirements in terms of time-to-solution and energy-to-solution yet without compromising the accuracy and stability of the calculation. These competing factors require the development of state-of-the-art algorithms that can optimally exploit the targeted underlying hardware and efficiently deliver the extreme computational capabilities typically required in operational forecast production. These algorithms should (i) minimise the energy footprint along with the time required to produce a solution, (ii) maintain a satisfying level of accuracy, (iii) be numerically stable and resilient, in case of hardware or software failure. The European Centre for Medium Range Weather Forecasts (ECMWF) is leading a project called ESCAPE (Energy-efficient SCalable Algorithms for weather Prediction on Exascale supercomputers) which is funded by Horizon 2020 (H2020) under initiative Future and Emerging Technologies in High Performance Computing (FET-HPC). The goal of the ESCAPE project is to develop a sustainable strategy to evolve weather and climate prediction models to next-generation computing technologies. The project partners incorporate the expertise of leading European regional forecasting consortia, university research, experienced high-performance computing centres and hardware vendors. This paper presents an overview of results obtained in the ESCAPE project in which weather prediction have been broken down into smaller building blocks called dwarfs. The participating weather prediction models are: IFS (Integrated Forecasting System), ALARO – a combination of AROME (Application de la Recherche à l'Opérationnel a Meso-Echelle) and ALADIN (Aire Limitée Adaptation Dynamique Développement International) and COSMO-EULAG – a combination of COSMO (Consortium for Small-scale Modeling) and EULAG (Eulerian/semi-Lagrangian fluid solver). The dwarfs are analysed and optimised in terms of computing performance for different hardware architectures (mainly Intel Skylake CPUs, NVIDIA GPUs, Intel Xeon Phi). The ESCAPE project includes the development of new algorithms that are specifically designed for better energy efficiency and improved portability through domain specific languages. In addition, the modularity of the algorithmic framework, naturally allows testing different existing numerical approaches, and their interplay with the emerging heterogeneous hardware landscape. Throughout the paper, we will compare different numerical techniques to solve the main building blocks that constitute weather models, in terms of energy efficiency and performance, on a variety of computing technologies.

Download Full-text

Transparent fault tolerance for scalable functional computation

Journal of Functional Programming ◽

10.1017/s095679681600006x ◽

2016 ◽

Vol 26 ◽

Cited By ~ 2

Author(s):

ROBERT STEWART ◽

PATRICK MAIER ◽

PHIL TRINDER

Keyword(s):

Fault Tolerance ◽

High Performance Computing ◽

High Performance ◽

Large Scale ◽

Programming Model ◽

Fault Tolerant ◽

Fault Recovery ◽

Actor Model ◽

Work Stealing ◽

Performance Computing

AbstractReliability is set to become a major concern on emergent large-scale architectures. While there are many parallel languages, and indeed many parallel functional languages, very few address reliability. The notable exception is the widely emulated Erlang distributed actor model that provides explicit supervision and recovery of actors with isolated state. We investigate scalable transparent fault tolerant functional computation with automatic supervision and recovery of tasks. We do so by developing HdpH-RS, a variant of the Haskell distributed parallel Haskell (HdpH) DSL with Reliable Scheduling. Extending the distributed work stealing protocol of HdpH for task supervision and recovery is challenging. To eliminate elusive concurrency bugs, we validate the HdpH-RS work stealing protocol using the SPIN model checker. HdpH-RS differs from the actor model in that its principal entities are tasks, i.e. independent stateless computations, rather than isolated stateful actors. Thanks to statelessness, fault recovery can be performed automatically and entirely hidden in the HdpH-RS runtime system. Statelessness is also key for proving a crucial property of the semantics of HdpH-RS: fault recovery does not change the result of the program, akin to deterministic parallelism. HdpH-RS provides a simple distributed fork/join-style programming model, with minimal exposure of fault tolerance at the language level, and a library of higher level abstractions such as algorithmic skeletons. In fact, the HdpH-RS DSL is exactly the same as the HdpH DSL, hence users can opt in or out of fault tolerant execution without any refactoring. Computations in HdpH-RS are always as reliable as the root node, no matter how many nodes and cores are actually used. We benchmark HdpH-RS on conventional clusters and an High Performance Computing platform: all benchmarks survive Chaos Monkey random fault injection; the system scales well e.g. up to 1,400 cores on the High Performance Computing; reliability and recovery overheads are consistently low even at scale.

Download Full-text

Beyond Moore’s technologies: operation principles of a superconductor alternative

Beilstein Journal of Nanotechnology ◽

10.3762/bjnano.8.269 ◽

2017 ◽

Vol 8 ◽

pp. 2689-2710 ◽

Cited By ~ 62

Author(s):

Igor I Soloviev ◽

Nikolay V Klenov ◽

Sergey V Bakurskiy ◽

Mikhail Yu Kupriyanov ◽

Alexander L Gudkov ◽

...

Keyword(s):

Energy Efficiency ◽

High Performance Computing ◽

Retrospective Review ◽

Digital Technology ◽

Energy Efficient ◽

High Performance ◽

Moore’S Law ◽

Moore's Law ◽

Memory Circuits ◽

Performance Computing

The predictions of Moore’s law are considered by experts to be valid until 2020 giving rise to “post-Moore’s” technologies afterwards. Energy efficiency is one of the major challenges in high-performance computing that should be answered. Superconductor digital technology is a promising post-Moore’s alternative for the development of supercomputers. In this paper, we consider operation principles of an energy-efficient superconductor logic and memory circuits with a short retrospective review of their evolution. We analyze their shortcomings in respect to computer circuits design. Possible ways of further research are outlined.

Download Full-text