Next-generation geophysical modelling

Author(s):  
Roman Nuterman ◽  
Dion Häfner ◽  
Markus Jochum ◽  
Brian Vinter

<div>So far, our pure Python, primitive equation ocean model Veros has been</div><div>about 50% slower than a corresponding Fortran implementation. But recent</div><div>benchmarks show that, thanks to a thriving scientific and machine</div><div>learning library ecosystem, tremendous speed-ups on GPU, and to a lesser</div><div>degree CPU, are within reach. On GPU, we find that the same model code</div><div>can reach a 2-5 times higher energy efficiency compared to a traditional</div><div>Fortran model.</div><div>We thus propose a new generation of geophysical models. One that</div><div>combines high-level abstractions and user friendliness on one hand, and</div><div>that leverages modern developments in high-performance computing on the</div><div>other hand.</div><div>We discuss what there is to gain from building models in high-level</div><div>programming languages, what we have achieved, and what the future holds</div><div>for us and the modelling community.</div>

2021 ◽  
Author(s):  
Roman Nuterman ◽  
Dion Häfner ◽  
Markus Jochum

<p>Until recently, our pure Python, primitive equation ocean model Veros <br>has been about 1.5x slower than a corresponding Fortran implementation. <br>But thanks to a thriving scientific and machine learning library <br>ecosystem, tremendous speed-ups on GPU, and to a lesser degree CPU, are <br>within reach. Leveraging Google's JAX library, we find that our Python <br>model code can reach a 2-5 times higher energy efficiency on GPU <br>compared to a traditional Fortran model.</p><p>Therefore, we propose a new generation of geophysical models: One that <br>combines high-level abstractions and user friendliness on one hand, and <br>that leverages modern developments in high-performance computing and <br>machine learning research on the other hand.</p><p>We discuss what there is to gain from building models in high-level <br>programming languages, what we have achieved in Veros, and where we see <br>the modelling community heading in the future.</p>


2003 ◽  
Vol 11 (1) ◽  
pp. 57-66 ◽  
Author(s):  
Quanming Lu ◽  
Vladimir Getov

Java is receiving increasing attention as the most popular platform for distributed computing. However, programmers are still reluctant to embrace Java as a tool for writing scientific and engineering applications due to its still noticeable performance drawbacks compared with other programming languages such as Fortran or C. In this paper, we present a hybrid Java/Fortran implementation of a parallel particle-in-cell (PIC) algorithm for plasma simulations. In our approach, the time-consuming components of this application are designed and implemented as Fortran subroutines, while less calculation-intensive components usually involved in building the user interface are written in Java. The two types of software modules have been glued together using the Java native interface (JNI). Our mixed-language PIC code was tested and its performance compared with pure Java and Fortran versions of the same algorithm on a Sun E6500 SMP system and a Linux cluster of Pentium~III machines.


Author(s):  
Umar Ibrahim Minhas ◽  
Roger Woods ◽  
Georgios Karakonstantis

AbstractWhilst FPGAs have been used in cloud ecosystems, it is still extremely challenging to achieve high compute density when mapping heterogeneous multi-tasks on shared resources at runtime. This work addresses this by treating the FPGA resource as a service and employing multi-task processing at the high level, design space exploration and static off-line partitioning in order to allow more efficient mapping of heterogeneous tasks onto the FPGA. In addition, a new, comprehensive runtime functional simulator is used to evaluate the effect of various spatial and temporal constraints on both the existing and new approaches when varying system design parameters. A comprehensive suite of real high performance computing tasks was implemented on a Nallatech 385 FPGA card and show that our approach can provide on average 2.9 × and 2.3 × higher system throughput for compute and mixed intensity tasks, while 0.2 × lower for memory intensive tasks due to external memory access latency and bandwidth limitations. The work has been extended by introducing a novel scheduling scheme to enhance temporal utilization of resources when using the proposed approach. Additional results for large queues of mixed intensity tasks (compute and memory) show that the proposed partitioning and scheduling approach can provide higher than 3 × system speedup over previous schemes.


Electronics ◽  
2021 ◽  
Vol 10 (5) ◽  
pp. 627
Author(s):  
David Marquez-Viloria ◽  
Luis Castano-Londono ◽  
Neil Guerrero-Gonzalez

A methodology for scalable and concurrent real-time implementation of highly recurrent algorithms is presented and experimentally validated using the AWS-FPGA. This paper presents a parallel implementation of a KNN algorithm focused on the m-QAM demodulators using high-level synthesis for fast prototyping, parameterization, and scalability of the design. The proposed design shows the successful implementation of the KNN algorithm for interchannel interference mitigation in a 3 × 16 Gbaud 16-QAM Nyquist WDM system. Additionally, we present a modified version of the KNN algorithm in which comparisons among data symbols are reduced by identifying the closest neighbor using the rule of the 8-connected clusters used for image processing. Real-time implementation of the modified KNN on a Xilinx Virtex UltraScale+ VU9P AWS-FPGA board was compared with the results obtained in previous work using the same data from the same experimental setup but offline DSP using Matlab. The results show that the difference is negligible below FEC limit. Additionally, the modified KNN shows a reduction of operations from 43 percent to 75 percent, depending on the symbol’s position in the constellation, achieving a reduction 47.25% reduction in total computational time for 100 K input symbols processed on 20 parallel cores compared to the KNN algorithm.


Electronics ◽  
2020 ◽  
Vol 9 (8) ◽  
pp. 1275
Author(s):  
Changdao Du ◽  
Yoshiki Yamaguchi

Due to performance and energy requirements, FPGA-based accelerators have become a promising solution for high-performance computations. Meanwhile, with the help of high-level synthesis (HLS) compilers, FPGA can be programmed using common programming languages such as C, C++, or OpenCL, thereby improving design efficiency and portability. Stencil computations are significant kernels in various scientific applications. In this paper, we introduce an architecture design for implementing stencil kernels on state-of-the-art FPGA with high bandwidth memory (HBM). Traditional FPGAs are usually equipped with external memory, e.g., DDR3 or DDR4, which limits the design space exploration in the spatial domain of stencil kernels. Therefore, many previous studies mainly relied on exploiting parallelism in the temporal domain to eliminate the bandwidth limitations. In our approach, we scale-up the design performance by considering both the spatial and temporal parallelism of the stencil kernel equally. We also discuss the design portability among different HLS compilers. We use typical stencil kernels to evaluate our design on a Xilinx U280 FPGA board and compare the results with other existing studies. By adopting our method, developers can take broad parallelization strategies based on specific FPGA resources to improve performance.


2019 ◽  
Vol 3 (4) ◽  
pp. 902-904
Author(s):  
Alexander Peyser ◽  
Sandra Diaz Pier ◽  
Wouter Klijn ◽  
Abigail Morrison ◽  
Jochen Triesch

Large-scale in silico experimentation depends on the generation of connectomes beyond available anatomical structure. We suggest that linking research across the fields of experimental connectomics, theoretical neuroscience, and high-performance computing can enable a new generation of models bridging the gap between biophysical detail and global function. This Focus Feature on ”Linking Experimental and Computational Connectomics” aims to bring together some examples from these domains as a step toward the development of more comprehensive generative models of multiscale connectomes.


Author(s):  
JOST BERTHOLD ◽  
HANS-WOLFGANG LOIDL ◽  
KEVIN HAMMOND

AbstractOver time, several competing approaches to parallel Haskell programming have emerged. Different approaches support parallelism at various different scales, ranging from small multicores to massively parallel high-performance computing systems. They also provide varying degrees of control, ranging from completely implicit approaches to ones providing full programmer control. Most current designs assume a shared memory model at the programmer, implementation and hardware levels. This is, however, becoming increasingly divorced from the reality at the hardware level. It also imposes significant unwanted runtime overheads in the form of garbage collection synchronisation etc. What is needed is an easy way to abstract over the implementation and hardware levels, while presenting a simple parallelism model to the programmer. The PArallEl shAred Nothing runtime system design aims to provide a portable and high-level shared-nothing implementation platform for parallel Haskell dialects. It abstracts over major issues such as work distribution and data serialisation, consolidating existing, successful designs into a single framework. It also provides an optional virtual shared-memory programming abstraction for (possibly) shared-nothing parallel machines, such as modern multicore/manycore architectures or cluster/cloud computing systems. It builds on, unifies and extends, existing well-developed support for shared-memory parallelism that is provided by the widely used GHC Haskell compiler. This paper summarises the state-of-the-art in shared-nothing parallel Haskell implementations, introduces the PArallEl shAred Nothing abstractions, shows how they can be used to implement three distinct parallel Haskell dialects, and demonstrates that good scalability can be obtained on recent parallel machines.


2014 ◽  
Vol 556-562 ◽  
pp. 3949-3951
Author(s):  
Jian Xin Zhu

Data mining is a technique that aims to analyze and understand large source data reveal knowledge hidden in the data. It has been viewed as an important evolution in information processing. Why there have been more attentions to it from researchers or businessmen is due to the wide availability of huge amounts of data and imminent needs for turning such data into valuable information. During the past decade or over, the concepts and techniques on data mining have been presented, and some of them have been discussed in higher levels for the last few years. Data mining involves an integration of techniques from database, artificial intelligence, machine learning, statistics, knowledge engineering, object-oriented method, information retrieval, high-performance computing and visualization. Essentially, data mining is high-level analysis technology and it has a strong purpose for business profiting. Unlike OLTP applications, data mining should provide in-depth data analysis and the supports for business decisions.


Sign in / Sign up

Export Citation Format

Share Document