Next-generation geophysical modelling

<div>So far, our pure Python, primitive equation ocean model Veros has been</div><div>about 50% slower than a corresponding Fortran implementation. But recent</div><div>benchmarks show that, thanks to a thriving scientific and machine</div><div>learning library ecosystem, tremendous speed-ups on GPU, and to a lesser</div><div>degree CPU, are within reach. On GPU, we find that the same model code</div><div>can reach a 2-5 times higher energy efficiency compared to a traditional</div><div>Fortran model.</div><div>We thus propose a new generation of geophysical models. One that</div><div>combines high-level abstractions and user friendliness on one hand, and</div><div>that leverages modern developments in high-performance computing on the</div><div>other hand.</div><div>We discuss what there is to gain from building models in high-level</div><div>programming languages, what we have achieved, and what the future holds</div><div>for us and the modelling community.</div>

Download Full-text

Higher-level geophysical modelling

10.5194/egusphere-egu21-2127 ◽

2021 ◽

Author(s):

Roman Nuterman ◽

Dion Häfner ◽

Markus Jochum

Keyword(s):

Machine Learning ◽

Programming Languages ◽

High Performance ◽

Ocean Model ◽

User Friendliness ◽

Model Code ◽

Building Models ◽

Fortran Implementation ◽

High Level ◽

New Generation

<p>Until recently, our pure Python, primitive equation ocean model Veros&#160;<br>has been about 1.5x slower than a corresponding Fortran implementation.&#160;<br>But thanks to a thriving scientific and machine learning library&#160;<br>ecosystem, tremendous speed-ups on GPU, and to a lesser degree CPU, are&#160;<br>within reach. Leveraging Google's JAX library, we find that our Python&#160;<br>model code can reach a 2-5 times higher energy efficiency on GPU&#160;<br>compared to a traditional Fortran model.</p><p>Therefore, we propose a new generation of geophysical models: One that&#160;<br>combines high-level abstractions and user friendliness on one hand, and&#160;<br>that leverages modern developments in high-performance computing and&#160;<br>machine learning research on the other hand.</p><p>We discuss what there is to gain from building models in high-level&#160;<br>programming languages, what we have achieved in Veros, and where we see&#160;<br>the modelling community heading in the future.</p>

Download Full-text

Mixed-Language High-Performance Computing for Plasma Simulations

Scientific Programming ◽

10.1155/2003/928543 ◽

2003 ◽

Vol 11 (1) ◽

pp. 57-66 ◽

Cited By ~ 2

Author(s):

Quanming Lu ◽

Vladimir Getov

Keyword(s):

Programming Languages ◽

High Performance ◽

Particle In Cell ◽

Mixed Language ◽

Software Modules ◽

Fortran Implementation ◽

Plasma Simulations ◽

Java Native Interface ◽

Pic Code ◽

Performance Computing

Java is receiving increasing attention as the most popular platform for distributed computing. However, programmers are still reluctant to embrace Java as a tool for writing scientific and engineering applications due to its still noticeable performance drawbacks compared with other programming languages such as Fortran or C. In this paper, we present a hybrid Java/Fortran implementation of a parallel particle-in-cell (PIC) algorithm for plasma simulations. In our approach, the time-consuming components of this application are designed and implemented as Fortran subroutines, while less calculation-intensive components usually involved in building the user interface are written in Java. The two types of software modules have been glued together using the Java native interface (JNI). Our mixed-language PIC code was tested and its performance compared with pure Java and Fortran versions of the same algorithm on a Sun E6500 SMP system and a Linux cluster of Pentium~III machines.

Download Full-text

Evaluation of Static Mapping for Dynamic Space-Shared Multi-task Processing on FPGAs

Journal of Signal Processing Systems ◽

10.1007/s11265-020-01633-z ◽

2021 ◽

Author(s):

Umar Ibrahim Minhas ◽

Roger Woods ◽

Georgios Karakonstantis

Keyword(s):

High Performance ◽

Design Space Exploration ◽

Design Space ◽

System Throughput ◽

Design Parameters ◽

Temporal Constraints ◽

Shared Resources ◽

Task Processing ◽

High Level ◽

Performance Computing

AbstractWhilst FPGAs have been used in cloud ecosystems, it is still extremely challenging to achieve high compute density when mapping heterogeneous multi-tasks on shared resources at runtime. This work addresses this by treating the FPGA resource as a service and employing multi-task processing at the high level, design space exploration and static off-line partitioning in order to allow more efficient mapping of heterogeneous tasks onto the FPGA. In addition, a new, comprehensive runtime functional simulator is used to evaluate the effect of various spatial and temporal constraints on both the existing and new approaches when varying system design parameters. A comprehensive suite of real high performance computing tasks was implemented on a Nallatech 385 FPGA card and show that our approach can provide on average 2.9 × and 2.3 × higher system throughput for compute and mixed intensity tasks, while 0.2 × lower for memory intensive tasks due to external memory access latency and bandwidth limitations. The work has been extended by introducing a novel scheduling scheme to enhance temporal utilization of resources when using the proposed approach. Additional results for large queues of mixed intensity tasks (compute and memory) show that the proposed partitioning and scheduling approach can provide higher than 3 × system speedup over previous schemes.

Download Full-text

A Modified KNN Algorithm for High-Performance Computing on FPGA of Real-Time m-QAM Demodulators

Electronics ◽

10.3390/electronics10050627 ◽

2021 ◽

Vol 10 (5) ◽

pp. 627

Author(s):

David Marquez-Viloria ◽

Luis Castano-Londono ◽

Neil Guerrero-Gonzalez

Keyword(s):

Real Time ◽

High Performance ◽

Interference Mitigation ◽

Parallel Implementation ◽

Computational Time ◽

Successful Implementation ◽

Interchannel Interference ◽

The Difference ◽

High Level ◽

Performance Computing

A methodology for scalable and concurrent real-time implementation of highly recurrent algorithms is presented and experimentally validated using the AWS-FPGA. This paper presents a parallel implementation of a KNN algorithm focused on the m-QAM demodulators using high-level synthesis for fast prototyping, parameterization, and scalability of the design. The proposed design shows the successful implementation of the KNN algorithm for interchannel interference mitigation in a 3 × 16 Gbaud 16-QAM Nyquist WDM system. Additionally, we present a modified version of the KNN algorithm in which comparisons among data symbols are reduced by identifying the closest neighbor using the rule of the 8-connected clusters used for image processing. Real-time implementation of the modified KNN on a Xilinx Virtex UltraScale+ VU9P AWS-FPGA board was compared with the results obtained in previous work using the same data from the same experimental setup but offline DSP using Matlab. The results show that the difference is negligible below FEC limit. Additionally, the modified KNN shows a reduction of operations from 43 percent to 75 percent, depending on the symbol’s position in the constellation, achieving a reduction 47.25% reduction in total computational time for 100 K input symbols processed on 20 parallel cores compared to the KNN algorithm.

Download Full-text

High-Level Synthesis Design for Stencil Computations on FPGA with High Bandwidth Memory

Electronics ◽

10.3390/electronics9081275 ◽

2020 ◽

Vol 9 (8) ◽

pp. 1275

Author(s):

Changdao Du ◽

Yoshiki Yamaguchi

Keyword(s):

Programming Languages ◽

High Performance ◽

Design Space Exploration ◽

Scale Up ◽

High Level Synthesis ◽

Stencil Computations ◽

Temporal Domain ◽

High Bandwidth ◽

Promising Solution ◽

High Level

Due to performance and energy requirements, FPGA-based accelerators have become a promising solution for high-performance computations. Meanwhile, with the help of high-level synthesis (HLS) compilers, FPGA can be programmed using common programming languages such as C, C++, or OpenCL, thereby improving design efficiency and portability. Stencil computations are significant kernels in various scientific applications. In this paper, we introduce an architecture design for implementing stencil kernels on state-of-the-art FPGA with high bandwidth memory (HBM). Traditional FPGAs are usually equipped with external memory, e.g., DDR3 or DDR4, which limits the design space exploration in the spatial domain of stencil kernels. Therefore, many previous studies mainly relied on exploiting parallelism in the temporal domain to eliminate the bandwidth limitations. In our approach, we scale-up the design performance by considering both the spatial and temporal parallelism of the stencil kernel equally. We also discuss the design portability among different HLS compilers. We use typical stencil kernels to evaluate our design on a Xilinx U280 FPGA board and compare the results with other existing studies. By adopting our method, developers can take broad parallelization strategies based on specific FPGA resources to improve performance.

Download Full-text

2016 Sixth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC)

10.1109/wolfhpc40351.2016 ◽

2016 ◽

Keyword(s):

High Performance Computing ◽

High Performance ◽

International Workshop ◽

Domain Specific Languages ◽

Domain Specific ◽

Sixth International Workshop ◽

High Level ◽

Performance Computing ◽

Sixth International

Download Full-text

Editorial: Linking experimental and computational connectomics

Network Neuroscience ◽

10.1162/netn_e_00108 ◽

2019 ◽

Vol 3 (4) ◽

pp. 902-904

Author(s):

Alexander Peyser ◽

Sandra Diaz Pier ◽

Wouter Klijn ◽

Abigail Morrison ◽

Jochen Triesch

Keyword(s):

High Performance Computing ◽

In Silico ◽

High Performance ◽

Large Scale ◽

Generative Models ◽

Anatomical Structure ◽

Global Function ◽

Theoretical Neuroscience ◽

New Generation ◽

Performance Computing

Large-scale in silico experimentation depends on the generation of connectomes beyond available anatomical structure. We suggest that linking research across the fields of experimental connectomics, theoretical neuroscience, and high-performance computing can enable a new generation of models bridging the gap between biophysical detail and global function. This Focus Feature on ”Linking Experimental and Computational Connectomics” aims to bring together some examples from these domains as a step toward the development of more comprehensive generative models of multiscale connectomes.

Download Full-text

Editor’s Note: Special Issue on High-Level Languages and Frameworks for High-Performance Computing

International Journal of Parallel Programming ◽

10.1007/s10766-019-00644-z ◽

2019 ◽

Vol 47 (5-6) ◽

pp. 1045-1045

Keyword(s):

High Performance Computing ◽

High Performance ◽

Special Issue ◽

High Level ◽

Performance Computing

Download Full-text

PAEAN: Portable and scalable runtime support for parallel Haskell dialects

Journal of Functional Programming ◽

10.1017/s0956796816000010 ◽

2016 ◽

Vol 26 ◽

Cited By ~ 1

Author(s):

JOST BERTHOLD ◽

HANS-WOLFGANG LOIDL ◽

KEVIN HAMMOND

Keyword(s):

Shared Memory ◽

High Performance ◽

Parallel Machines ◽

State Of The Art ◽

Computing Systems ◽

Programming Abstraction ◽

Work Distribution ◽

High Level ◽

Parallelism Model ◽

Performance Computing

AbstractOver time, several competing approaches to parallel Haskell programming have emerged. Different approaches support parallelism at various different scales, ranging from small multicores to massively parallel high-performance computing systems. They also provide varying degrees of control, ranging from completely implicit approaches to ones providing full programmer control. Most current designs assume a shared memory model at the programmer, implementation and hardware levels. This is, however, becoming increasingly divorced from the reality at the hardware level. It also imposes significant unwanted runtime overheads in the form of garbage collection synchronisation etc. What is needed is an easy way to abstract over the implementation and hardware levels, while presenting a simple parallelism model to the programmer. The PArallEl shAred Nothing runtime system design aims to provide a portable and high-level shared-nothing implementation platform for parallel Haskell dialects. It abstracts over major issues such as work distribution and data serialisation, consolidating existing, successful designs into a single framework. It also provides an optional virtual shared-memory programming abstraction for (possibly) shared-nothing parallel machines, such as modern multicore/manycore architectures or cluster/cloud computing systems. It builds on, unifies and extends, existing well-developed support for shared-memory parallelism that is provided by the widely used GHC Haskell compiler. This paper summarises the state-of-the-art in shared-nothing parallel Haskell implementations, introduces the PArallEl shAred Nothing abstractions, shows how they can be used to implement three distinct parallel Haskell dialects, and demonstrates that good scalability can be obtained on recent parallel machines.

Download Full-text

Arithmetic Research on Data Mining Technology and Associative Rules Mining

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.556-562.3949 ◽

2014 ◽

Vol 556-562 ◽

pp. 3949-3951

Author(s):

Jian Xin Zhu

Keyword(s):

Data Mining ◽

High Performance ◽

Knowledge Engineering ◽

Business Decisions ◽

Depth Data ◽

Source Data ◽

Wide Availability ◽

High Level ◽

Performance Computing ◽

Level Analysis

Data mining is a technique that aims to analyze and understand large source data reveal knowledge hidden in the data. It has been viewed as an important evolution in information processing. Why there have been more attentions to it from researchers or businessmen is due to the wide availability of huge amounts of data and imminent needs for turning such data into valuable information. During the past decade or over, the concepts and techniques on data mining have been presented, and some of them have been discussed in higher levels for the last few years. Data mining involves an integration of techniques from database, artificial intelligence, machine learning, statistics, knowledge engineering, object-oriented method, information retrieval, high-performance computing and visualization. Essentially, data mining is high-level analysis technology and it has a strong purpose for business profiting. Unlike OLTP applications, data mining should provide in-depth data analysis and the supports for business decisions.

Download Full-text