scholarly journals On the cost of iterative computations

Author(s):  
Erin Carson ◽  
Zdeněk Strakoš

With exascale-level computation on the horizon, the art of predicting the cost of computations has acquired a renewed focus. This task is especially challenging in the case of iterative methods, for which convergence behaviour often cannot be determined with certainty a priori (unless we are satisfied with potentially outrageous overestimates) and which typically suffer from performance bottlenecks at scale due to synchronization cost. Moreover, the amplification of rounding errors can substantially affect the practical performance, in particular for methods with short recurrences. In this article, we focus on what we consider to be key points which are crucial to understanding the cost of iteratively solving linear algebraic systems. This naturally leads us to questions on the place of numerical analysis in relation to mathematics, computer science and sciences, in general. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.

Author(s):  
Jack Dongarra ◽  
Laura Grigori ◽  
Nicholas J. Higham

A number of features of today’s high-performance computers make it challenging to exploit these machines fully for computational science. These include increasing core counts but stagnant clock frequencies; the high cost of data movement; use of accelerators (GPUs, FPGAs, coprocessors), making architectures increasingly heterogeneous; and multi- ple precisions of floating-point arithmetic, including half-precision. Moreover, as well as maximizing speed and accuracy, minimizing energy consumption is an important criterion. New generations of algorithms are needed to tackle these challenges. We discuss some approaches that we can take to develop numerical algorithms for high-performance computational science, with a view to exploiting the next generation of supercomputers. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.


Author(s):  
Hartwig Anzt ◽  
Erik Boman ◽  
Rob Falgout ◽  
Pieter Ghysels ◽  
Michael Heroux ◽  
...  

Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.


Author(s):  
Olivier Beaumont ◽  
Julien Herrmann ◽  
Guillaume Pallez (Aupy) ◽  
Alena Shilova

Deep learning training memory needs can prevent the user from considering large models and large batch sizes. In this work, we propose to use techniques from memory-aware scheduling and automatic differentiation (AD) to execute a backpropagation graph with a bounded memory requirement at the cost of extra recomputations. The case of a single homogeneous chain, i.e. the case of a network whose stages are all identical and form a chain, is well understood and optimal solutions have been proposed in the AD literature. The networks encountered in practice in the context of deep learning are much more diverse, both in terms of shape and heterogeneity. In this work, we define the class of backpropagation graphs, and extend those on which one can compute in polynomial time a solution that minimizes the total number of recomputations. In particular, we consider join graphs which correspond to models such as siamese or cross-modal networks. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.


Author(s):  
T. N. Palmer

The case is made for a much closer synergy between climate science, numerical analysis and computer science. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.


Author(s):  
Katherine Yelick ◽  
Aydın Buluç ◽  
Muaaz Awan ◽  
Ariful Azad ◽  
Benjamin Brock ◽  
...  

Genomic datasets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share these data with the research community, but some of these genomic data analysis problems require large-scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high-end parallel systems today and place different requirements on programming support, software libraries and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high-performance genomics analysis, including alignment, profiling, clustering and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or ‘motifs’ that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.


2021 ◽  
Vol 47 (2) ◽  
pp. 1-28
Author(s):  
Goran Flegar ◽  
Hartwig Anzt ◽  
Terry Cojean ◽  
Enrique S. Quintana-Ortí

The use of mixed precision in numerical algorithms is a promising strategy for accelerating scientific applications. In particular, the adoption of specialized hardware and data formats for low-precision arithmetic in high-end GPUs (graphics processing units) has motivated numerous efforts aiming at carefully reducing the working precision in order to speed up the computations. For algorithms whose performance is bound by the memory bandwidth, the idea of compressing its data before (and after) memory accesses has received considerable attention. One idea is to store an approximate operator–like a preconditioner–in lower than working precision hopefully without impacting the algorithm output. We realize the first high-performance implementation of an adaptive precision block-Jacobi preconditioner which selects the precision format used to store the preconditioner data on-the-fly, taking into account the numerical properties of the individual preconditioner blocks. We implement the adaptive block-Jacobi preconditioner as production-ready functionality in the Ginkgo linear algebra library, considering not only the precision formats that are part of the IEEE standard, but also customized formats which optimize the length of the exponent and significand to the characteristics of the preconditioner blocks. Experiments run on a state-of-the-art GPU accelerator show that our implementation offers attractive runtime savings.


2013 ◽  
Vol 718-720 ◽  
pp. 1645-1650
Author(s):  
Gen Yin Cheng ◽  
Sheng Chen Yu ◽  
Zhi Yong Wei ◽  
Shao Jie Chen ◽  
You Cheng

Commonly used commercial simulation software SYSNOISE and ANSYS is run on a single machine (can not directly run on parallel machine) when use the finite element and boundary element to simulate muffler effect, and it will take more than ten days, sometimes even twenty days to work out an exact solution as the large amount of numerical simulation. Use a high performance parallel machine which was built by 32 commercial computers and transform the finite element and boundary element simulation software into a program that can running under the MPI (message passing interface) parallel environment in order to reduce the cost of numerical simulation. The relevant data worked out from the simulation experiment demonstrate that the result effect of the numerical simulation is well. And the computing speed of the high performance parallel machine is 25 ~ 30 times a microcomputer.


2021 ◽  
Vol 54 (1) ◽  
Author(s):  
Paul D. Bates

Every year flood events lead to thousands of casualties and significant economic damage. Mapping the areas at risk of flooding is critical to reducing these losses, yet until the last few years such information was available for only a handful of well-studied locations. This review surveys recent progress to address this fundamental issue through a novel combination of appropriate physics, efficient numerical algorithms, high-performance computing, new sources of big data, and model automation frameworks. The review describes the fluid mechanics of inundation and the models used to predict it, before going on to consider the developments that have led in the last five years to the creation of the first true fluid mechanics models of flooding over the entire terrestrial land surface. Expected final online publication date for the Annual Review of Fluid Mechanics, Volume 54 is January 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.


2021 ◽  
Vol 04 (1) ◽  
pp. 54-54
Author(s):  
V. R. Nigmatullin ◽  
◽  
I. R. Nigmatullin ◽  
R. G. Nigmatullin ◽  
A.M. Migranov ◽  
...  

Currently, to increase the efficiency of industrial production, high-performance and expensive technological equipment is increasingly used, in which the weakest link, from the point of view of efficiency and reliability, is the components and parts of heavily loaded tribo – couplings operating both at significantly different temperatures (conditionally under lighter conditions, the temperature difference can be 100-120 degrees) and climatic conditions (high humidity, the presence of abrasives and other chemical elements in the atmosphere). As the results of the analysis of the frequency of failures of friction units and, accordingly, the cost of their restoration reach 9-20 percent of the cost of all equipment, without taking into account significant losses of income (profit) of the enterprise from downtime. The solution of this problem is based on the study of the wear rate of friction units by the wear products accumulated in working oils, cooling lubricants, and greases. A digital equipment monitoring system (DSMT) has been developed and implemented, which includes dynamic recording of the number of wear products and oil temperature by original modern recording devices, followed by the technology of their processing and use. The system also includes methods for finding the necessary information in large data sets useful and necessary in theoretical and practical terms with a similar technique controlled by a digital monitoring system. The advantages of SMT are the ability to predict the reliability of the equipment; reduce production risks and significantly reduce inefficient costs.


Sign in / Sign up

Export Citation Format

Share Document