fortran implementation
Recently Published Documents


TOTAL DOCUMENTS

34
(FIVE YEARS 3)

H-INDEX

7
(FIVE YEARS 1)

2021 ◽  
Author(s):  
Roman Nuterman ◽  
Dion Häfner ◽  
Markus Jochum

<p>Until recently, our pure Python, primitive equation ocean model Veros <br>has been about 1.5x slower than a corresponding Fortran implementation. <br>But thanks to a thriving scientific and machine learning library <br>ecosystem, tremendous speed-ups on GPU, and to a lesser degree CPU, are <br>within reach. Leveraging Google's JAX library, we find that our Python <br>model code can reach a 2-5 times higher energy efficiency on GPU <br>compared to a traditional Fortran model.</p><p>Therefore, we propose a new generation of geophysical models: One that <br>combines high-level abstractions and user friendliness on one hand, and <br>that leverages modern developments in high-performance computing and <br>machine learning research on the other hand.</p><p>We discuss what there is to gain from building models in high-level <br>programming languages, what we have achieved in Veros, and where we see <br>the modelling community heading in the future.</p>


2020 ◽  
Author(s):  
Roman Nuterman ◽  
Dion Häfner ◽  
Markus Jochum ◽  
Brian Vinter

<div>So far, our pure Python, primitive equation ocean model Veros has been</div><div>about 50% slower than a corresponding Fortran implementation. But recent</div><div>benchmarks show that, thanks to a thriving scientific and machine</div><div>learning library ecosystem, tremendous speed-ups on GPU, and to a lesser</div><div>degree CPU, are within reach. On GPU, we find that the same model code</div><div>can reach a 2-5 times higher energy efficiency compared to a traditional</div><div>Fortran model.</div><div>We thus propose a new generation of geophysical models. One that</div><div>combines high-level abstractions and user friendliness on one hand, and</div><div>that leverages modern developments in high-performance computing on the</div><div>other hand.</div><div>We discuss what there is to gain from building models in high-level</div><div>programming languages, what we have achieved, and what the future holds</div><div>for us and the modelling community.</div>


2019 ◽  
Vol 12 (4) ◽  
pp. 1423-1441 ◽  
Author(s):  
Luca Bertagna ◽  
Michael Deakin ◽  
Oksana Guba ◽  
Daniel Sunderland ◽  
Andrew M. Bradley ◽  
...  

Abstract. We present an architecture-portable and performant implementation of the atmospheric dynamical core (High-Order Methods Modeling Environment, HOMME) of the Energy Exascale Earth System Model (E3SM). The original Fortran implementation is highly performant and scalable on conventional architectures using the Message Passing Interface (MPI) and Open MultiProcessor (OpenMP) programming models. We rewrite the model in C++ and use the Kokkos library to express on-node parallelism in a largely architecture-independent implementation. Kokkos provides an abstraction of a compute node or device, layout-polymorphic multidimensional arrays, and parallel execution constructs. The new implementation achieves the same or better performance on conventional multicore computers and is portable to GPUs. We present performance data for the original and new implementations on multiple platforms, on up to 5400 compute nodes, and study several aspects of the single- and multi-node performance characteristics of the new implementation on conventional CPU (e.g., Intel Xeon), many core CPU (e.g., Intel Xeon Phi Knights Landing), and Nvidia V100 GPU.


2018 ◽  
Author(s):  
Luca Bertagna ◽  
Michael Deakin ◽  
Oksana Guba ◽  
Daniel Sunderland ◽  
Andrew M. Bradley ◽  
...  

Abstract. We present an architecture-portable and performant implementation of the atmospheric dynamical core (HOMME) of the Energy Exascale Earth System Model (E3SM). The original Fortran implementation is highly performant and scalable on conventional architectures using MPI and OpenMP. We rewrite the model in C++ and use the Kokkos library to express on-node parallelism in a largely architecture-independent implementation. Kokkos provides an abstraction of a compute node or device, layout-polymorphic multidimensional arrays, and parallel execution constructs. The new implementation achieves the same or better performance on conventional multicore computers and is portable to GPUs. We present performance data for the original and new implementations on multiple platforms, on up to 5400 compute nodes, and study several aspects of the single- and multi-node performance characteristics of the new implementation on conventional CPU, Intel Xeon Phi Knights Landing, and Nvidia V100 GPU.


2016 ◽  
Vol 33 (7) ◽  
pp. 2149-2164 ◽  
Author(s):  
Feng Chang ◽  
Weiqiang Wang ◽  
Yan Liu ◽  
Yanpeng Qu

Purpose As one of the earliest high-level programming languages, Fortran with easy accessibility and computational efficiency is widely used in the engineering field. The purpose of this paper is to present a Fortran implementation of isogeometric analysis (IGA) for thin plate problems. Design/methodology/approach IGA based on non-uniform rational B-splines (NURBS) offers exact geometries and is more accurate than finite element analysis (FEA). Unlike the basis functions in FEA, NURBS basis functions are non-interpolated. Hence, the penalty method is used to enforce boundary conditions. Findings Several thin plate examples based on the Kirchhoff-Love theory were illustrated to demonstrate the accuracy of the implementation in contrast with analytical solutions, and the efficiency was validated in comparison with another open method. Originality/value A Fortran implementation of NURBS-based IGA was developed to solve Kirchhoff-Love plate problems. It easily obtained high-continuity basis functions, which are necessary for Kirchhoff formulation. In comparison with theoretical solutions, the numerical examples demonstrated higher accuracy and faster convergence of the Fortran implementation. The Fortran implementation can well solve the time-consuming problem, and it was validated by the time-consumption comparison with the Matlab implementation. Due to the non-interpolation of NURBS, the penalty method was used to impose boundary conditions. A suggestion of the selection of penalty coefficients was given.


2013 ◽  
Vol 17 (2) ◽  
pp. 569-583 ◽  
Author(s):  
Deepak Eachempati ◽  
Alan Richardson ◽  
Siddhartha Jana ◽  
Terrence Liao ◽  
Henri Calandra ◽  
...  

Author(s):  
T Brandvik ◽  
G Pullan

The implementation of a two-dimensional Euler solver on graphics hardware is described. The graphics processing unit is highly parallelized and uses a programming model that is well suited to flow computation. Results for a transonic turbine cascade test-case are presented. For large grids (106 nodes) a 40 times speed-up compared with a Fortran implementation on a contemporary CPU is observed.


Sign in / Sign up

Export Citation Format

Share Document