On the cost of iterative computations

With exascale-level computation on the horizon, the art of predicting the cost of computations has acquired a renewed focus. This task is especially challenging in the case of iterative methods, for which convergence behaviour often cannot be determined with certainty a priori (unless we are satisfied with potentially outrageous overestimates) and which typically suffer from performance bottlenecks at scale due to synchronization cost. Moreover, the amplification of rounding errors can substantially affect the practical performance, in particular for methods with short recurrences. In this article, we focus on what we consider to be key points which are crucial to understanding the cost of iteratively solving linear algebraic systems. This naturally leads us to questions on the place of numerical analysis in relation to mathematics, computer science and sciences, in general. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.

Download Full-text

Numerical algorithms for high-performance computational science

Philosophical Transactions of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rsta.2019.0066 ◽

2020 ◽

Vol 378 (2166) ◽

pp. 20190066 ◽

Cited By ~ 2

Author(s):

Jack Dongarra ◽

Laura Grigori ◽

Nicholas J. Higham

Keyword(s):

High Performance ◽

Numerical Algorithms ◽

Computational Science ◽

Floating Point ◽

Important Criterion ◽

Data Movement ◽

Floating Point Arithmetic ◽

High Performance Computers ◽

Point Arithmetic ◽

Speed And Accuracy

A number of features of today’s high-performance computers make it challenging to exploit these machines fully for computational science. These include increasing core counts but stagnant clock frequencies; the high cost of data movement; use of accelerators (GPUs, FPGAs, coprocessors), making architectures increasingly heterogeneous; and multi- ple precisions of floating-point arithmetic, including half-precision. Moreover, as well as maximizing speed and accuracy, minimizing energy consumption is an important criterion. New generations of algorithms are needed to tackle these challenges. We discuss some approaches that we can take to develop numerical algorithms for high-performance computational science, with a view to exploiting the next generation of supercomputers. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.

Download Full-text

Preparing sparse solvers for exascale computing

Philosophical Transactions of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rsta.2019.0053 ◽

2020 ◽

Vol 378 (2166) ◽

pp. 20190053

Author(s):

Hartwig Anzt ◽

Erik Boman ◽

Rob Falgout ◽

Pieter Ghysels ◽

Michael Heroux ◽

...

Keyword(s):

High Performance ◽

Numerical Algorithms ◽

Computational Science ◽

Department Of Energy ◽

High Fidelity ◽

Scientific Applications ◽

Exascale Computing ◽

Multi Scale ◽

The Us ◽

Computing Platforms

Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.

Download Full-text

Optimal memory-aware backpropagation of deep join networks

Philosophical Transactions of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rsta.2019.0049 ◽

2020 ◽

Vol 378 (2166) ◽

pp. 20190049

Author(s):

Olivier Beaumont ◽

Julien Herrmann ◽

Guillaume Pallez (Aupy) ◽

Alena Shilova

Keyword(s):

Deep Learning ◽

High Performance ◽

Automatic Differentiation ◽

Numerical Algorithms ◽

Optimal Solutions ◽

Bounded Memory ◽

Training Memory ◽

Batch Sizes ◽

A Chain ◽

The Cost

Deep learning training memory needs can prevent the user from considering large models and large batch sizes. In this work, we propose to use techniques from memory-aware scheduling and automatic differentiation (AD) to execute a backpropagation graph with a bounded memory requirement at the cost of extra recomputations. The case of a single homogeneous chain, i.e. the case of a network whose stages are all identical and form a chain, is well understood and optimal solutions have been proposed in the AD literature. The networks encountered in practice in the context of deep learning are much more diverse, both in terms of shape and heterogeneity. In this work, we define the class of backpropagation graphs, and extend those on which one can compute in polynomial time a solution that minimizes the total number of recomputations. In particular, we consider join graphs which correspond to models such as siamese or cross-modal networks. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.

Download Full-text

The physics of numerical analysis: a climate modelling case study

Philosophical Transactions of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rsta.2019.0058 ◽

2020 ◽

Vol 378 (2166) ◽

pp. 20190058 ◽

Cited By ~ 1

Author(s):

T. N. Palmer

Keyword(s):

Numerical Analysis ◽

Computer Science ◽

High Performance ◽

Numerical Algorithms ◽

Climate Science ◽

Computational Science ◽

Climate Modelling

The case is made for a much closer synergy between climate science, numerical analysis and computer science. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.

Download Full-text

The parallelism motifs of genomic data analysis

Philosophical Transactions of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rsta.2019.0394 ◽

2020 ◽

Vol 378 (2166) ◽

pp. 20190394

Author(s):

Katherine Yelick ◽

Aydın Buluç ◽

Muaaz Awan ◽

Ariful Azad ◽

Benjamin Brock ◽

...

Keyword(s):

Data Analysis ◽

High Performance ◽

Architectural Design ◽

Large Scale ◽

Numerical Algorithms ◽

Genomic Data ◽

Scientific Simulations ◽

Genomic Data Analysis ◽

The Cost ◽

Support Software

Genomic datasets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share these data with the research community, but some of these genomic data analysis problems require large-scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high-end parallel systems today and place different requirements on programming support, software libraries and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high-performance genomics analysis, including alignment, profiling, clustering and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or ‘motifs’ that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.

Download Full-text

Adaptive Precision Block-Jacobi for High Performance Preconditioning in the Ginkgo Linear Algebra Software

ACM Transactions on Mathematical Software ◽

10.1145/3441850 ◽

2021 ◽

Vol 47 (2) ◽

pp. 1-28

Author(s):

Goran Flegar ◽

Hartwig Anzt ◽

Terry Cojean ◽

Enrique S. Quintana-Ortí

Keyword(s):

Linear Algebra ◽

Graphics Processing Units ◽

High Performance ◽

Numerical Algorithms ◽

Mixed Precision ◽

Before And After ◽

Memory Accesses ◽

Specialized Hardware ◽

The Individual ◽

Graphics Processing

The use of mixed precision in numerical algorithms is a promising strategy for accelerating scientific applications. In particular, the adoption of specialized hardware and data formats for low-precision arithmetic in high-end GPUs (graphics processing units) has motivated numerous efforts aiming at carefully reducing the working precision in order to speed up the computations. For algorithms whose performance is bound by the memory bandwidth, the idea of compressing its data before (and after) memory accesses has received considerable attention. One idea is to store an approximate operator–like a preconditioner–in lower than working precision hopefully without impacting the algorithm output. We realize the first high-performance implementation of an adaptive precision block-Jacobi preconditioner which selects the precision format used to store the preconditioner data on-the-fly, taking into account the numerical properties of the individual preconditioner blocks. We implement the adaptive block-Jacobi preconditioner as production-ready functionality in the Ginkgo linear algebra library, considering not only the precision formats that are part of the IEEE standard, but also customized formats which optimize the length of the exponent and significand to the characteristics of the preconditioner blocks. Experiments run on a state-of-the-art GPU accelerator show that our implementation offers attractive runtime savings.

Download Full-text

Based on Numerical Simulation of High-Performance Parallel Machine Muffler Experimental Calibration

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.718-720.1645 ◽

2013 ◽

Vol 718-720 ◽

pp. 1645-1650

Author(s):

Gen Yin Cheng ◽

Sheng Chen Yu ◽

Zhi Yong Wei ◽

Shao Jie Chen ◽

You Cheng

Keyword(s):

Numerical Simulation ◽

Finite Element ◽

Boundary Element ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Parallel Machine ◽

Simulation Software ◽

Experimental Calibration ◽

The Cost

Commonly used commercial simulation software SYSNOISE and ANSYS is run on a single machine (can not directly run on parallel machine) when use the finite element and boundary element to simulate muffler effect, and it will take more than ten days, sometimes even twenty days to work out an exact solution as the large amount of numerical simulation. Use a high performance parallel machine which was built by 32 commercial computers and transform the finite element and boundary element simulation software into a program that can running under the MPI (message passing interface) parallel environment in order to reduce the cost of numerical simulation. The relevant data worked out from the simulation experiment demonstrate that the result effect of the numerical simulation is well. And the computing speed of the high performance parallel machine is 25 ~ 30 times a microcomputer.

Download Full-text

Flood Inundation Prediction

Annual Review of Fluid Mechanics ◽

10.1146/annurev-fluid-030121-113138 ◽

2021 ◽

Vol 54 (1) ◽

Author(s):

Paul D. Bates

Keyword(s):

Fluid Mechanics ◽

Land Surface ◽

High Performance ◽

Numerical Algorithms ◽

Economic Damage ◽

Annual Review ◽

Publication Date ◽

Flood Inundation ◽

Damage Mapping ◽

Performance Computing

Every year flood events lead to thousands of casualties and significant economic damage. Mapping the areas at risk of flooding is critical to reducing these losses, yet until the last few years such information was available for only a handful of well-studied locations. This review surveys recent progress to address this fundamental issue through a novel combination of appropriate physics, efficient numerical algorithms, high-performance computing, new sources of big data, and model automation frameworks. The review describes the fluid mechanics of inundation and the models used to predict it, before going on to consider the developments that have led in the last five years to the creation of the first true fluid mechanics models of flooding over the entire terrestrial land surface. Expected final online publication date for the Annual Review of Fluid Mechanics, Volume 54 is January 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

Download Full-text

Digital monitoring system of equipment for the analysis of fuel and lubricants quality

World of OIL Products the Oil Companies Bulletin ◽

10.32758/2071-5951-2021-1-4-54-59 ◽

2021 ◽

Vol 04 (1) ◽

pp. 54-54

Author(s):

V. R. Nigmatullin ◽

◽

I. R. Nigmatullin ◽

R. G. Nigmatullin ◽

A.M. Migranov ◽

...

Keyword(s):

Monitoring System ◽

High Performance ◽

Chemical Elements ◽

Large Data ◽

Climatic Conditions ◽

Point Of View ◽

Friction Units ◽

Digital Monitoring ◽

Wear Products ◽

The Cost

Currently, to increase the efficiency of industrial production, high-performance and expensive technological equipment is increasingly used, in which the weakest link, from the point of view of efficiency and reliability, is the components and parts of heavily loaded tribo – couplings operating both at significantly different temperatures (conditionally under lighter conditions, the temperature difference can be 100-120 degrees) and climatic conditions (high humidity, the presence of abrasives and other chemical elements in the atmosphere). As the results of the analysis of the frequency of failures of friction units and, accordingly, the cost of their restoration reach 9-20 percent of the cost of all equipment, without taking into account significant losses of income (profit) of the enterprise from downtime. The solution of this problem is based on the study of the wear rate of friction units by the wear products accumulated in working oils, cooling lubricants, and greases. A digital equipment monitoring system (DSMT) has been developed and implemented, which includes dynamic recording of the number of wear products and oil temperature by original modern recording devices, followed by the technology of their processing and use. The system also includes methods for finding the necessary information in large data sets useful and necessary in theoretical and practical terms with a similar technique controlled by a digital monitoring system. The advantages of SMT are the ability to predict the reliability of the equipment; reduce production risks and significantly reduce inefficient costs.

Download Full-text

Toward a priori selection of ellipsometry angles and wavelengths using a high performance semantic database

Thin Solid Films ◽

10.1016/s0040-6090(97)00783-9 ◽

1998 ◽

Vol 313-314 ◽

pp. 119-123 ◽

Cited By ~ 2

Author(s):

F.K. Urban III ◽

David Barton

Keyword(s):

High Performance ◽

A Priori ◽

Semantic Database ◽

Selection Of

Download Full-text