PhotoNs-GPU: A GPU accelerated cosmological simulation code

Abstract We present a GPU-accelerated cosmological simulation code, PhotoNs-GPU, based on an algorithm of Particle Mesh Fast Multipole Method (PM-FMM), and focus on the GPU utilization and optimization. A proper interpolated method for truncated gravity is introduced to speed up the special functions in kernels. We verify the GPU code in mixed precision and different levels of theinterpolated method on GPU. A run with single precision is roughly two times faster than double precision for current practical cosmological simulations. But it could induce an unbiased small noise in power spectrum. Compared with the CPU version of PhotoNs and Gadget-2, the efficiency of the new code is significantly improved. Activated all the optimizations on the memory access, kernel functions and concurrency management, the peak performance of our test runs achieves 48% of the theoretical speed and the average performance approaches to ∼35% on GPU.

Download Full-text

Single precision arithmetic in ECHAM radiation reduces runtime and energy consumption

10.5194/gmd-2020-3 ◽

2020 ◽

Author(s):

Alessandro Cotronei ◽

Thomas Slawig

Keyword(s):

Energy Consumption ◽

Observational Data ◽

Atmospheric Model ◽

Step Change ◽

Double Precision ◽

Performance Gain ◽

Low Resolution ◽

Single Precision ◽

Speed Up ◽

Echam Model

Abstract. We converted the radiation part of the atmospheric model ECHAM to single precision arithmetic. We analyzed different conversion strategies and finally used a step by step change of all modules, subroutines and functions. We found out that a small code portion still requires higher precision arithmetic. We generated code that can be easily changed from double to single precision and vice versa, basically using a simple switch in one module. We compared the output of the single precision version in the coarse resolution with observational data and with the original double precision code. The results of both versions are comparable. We extensively tested different parallelization options with respect to the possible performance gain, in both coarse and low resolution. The single precision radiation itself was accelerated by about 40%, whereas the speed-up for the whole ECHAM model using the converted radiation achieved 18% in the best configuration. We further measured the energy consumption, which could also be reduced.

Download Full-text

The effect of the north-east ice stream on the Greenland ice sheet in changing climates

The Cryosphere Discussions ◽

10.5194/tcd-1-41-2007 ◽

2007 ◽

Vol 1 (1) ◽

pp. 41-76 ◽

Cited By ~ 10

Author(s):

R. Greve ◽

S. Otsu

Keyword(s):

Large Scale ◽

Ice Sheet ◽

Greenland Ice Sheet ◽

Simulation Code ◽

North East ◽

The North ◽

Ice Stream ◽

Basal Sliding ◽

Speed Up ◽

Fast Flow

Abstract. The north-east Greenland ice stream (NEGIS) was discovered as a large fast-flow feature of the Greenland ice sheet by synthetic aperture radar (SAR) imaginary of the ERS-1 satellite. In this study, the NEGIS is implemented in the dynamic/thermodynamic, large-scale ice-sheet model SICOPOLIS (Simulation Code for POLythermal Ice Sheets). In the first step, we simulate the evolution of the ice sheet on a 10-km grid for the period from 250 ka ago until today, driven by a climatology reconstructed from a combination of present-day observations and GCM results for the past. We assume that the NEGIS area is characterized by enhanced basal sliding compared to the "normal", slowly-flowing areas of the ice sheet, and find that the misfit between simulated and observed ice thicknesses and surface velocities is minimized for a sliding enhancement by the factor three. In the second step, the consequences of the NEGIS, and also of surface-meltwater-induced acceleration of basal sliding, for the possible decay of the Greenland ice sheet in future warming climates are investigated. It is demonstrated that the ice sheet is generally very susceptible to global warming on time-scales of centuries and that surface-meltwater-induced acceleration of basal sliding can speed up the decay significantly, whereas the NEGIS is not likely to dynamically destabilize the ice sheet as a whole.

Download Full-text

Application of Hierarchical Two-Level Spectral Preconditioning Method for Electromagnetic Scattering from the Rough Surface

International Journal of Antennas and Propagation ◽

10.1155/2014/752418 ◽

2014 ◽

Vol 2014 ◽

pp. 1-10 ◽

Cited By ~ 4

Author(s):

D. Z. Ding ◽

G. M. Li ◽

Y. Y. An ◽

R. S. Chen

Keyword(s):

Rough Surface ◽

Electromagnetic Scattering ◽

Fast Multipole Method ◽

Scattering Problems ◽

Approximate Inverse ◽

Sparse Approximate Inverse ◽

Preconditioning Method ◽

Speed Up ◽

Spectral Preconditioner ◽

Field Integral

The higher-order hierarchical Legendre basis functions combining the electrical field integral equations (EFIE) are developed to solve the scattering problems from the rough surface. The hierarchical two-level spectral preconditioning method is developed for the generalized minimal residual iterative method (GMRES). The hierarchical two-level spectral preconditioner is constructed by combining the spectral preconditioner and sparse approximate inverse (SAI) preconditioner to speed up the convergence rate of iterative methods. The multilevel fast multipole method (MLFMM) is employed to reduce memory requirement and computational complexity of the method of moments (MoM) solution. The accuracy and efficiency are confirmed with a couple of numerical examples.

Download Full-text

A JOINT EFFORT OF SPEEDED-UP ROBUST FEATURES ALGORITHM AND A DISPARITY-BASED MODEL FOR 3D INDOOR MAPPING USING RGB-D DATA

Boletim de Ciências Geodésicas ◽

10.1590/s1982-21702018000300023 ◽

2018 ◽

Vol 24 (3) ◽

pp. 351-366

Author(s):

Marcos Aurélio Basso ◽

Daniel Rodrigues dos Santos

Keyword(s):

Indoor Environments ◽

3D Mapping ◽

Joint Effort ◽

Plane Model ◽

Loop Closure ◽

Speeded Up Robust Features ◽

Speed Up ◽

Coarse To Fine ◽

Fine Registration ◽

Different Levels

Abstract In this paper, we present a method for 3D mapping of indoor environments using RGB-D data. The contribution of our proposed method is two-fold. First, our method exploits a joint effort of the speed-up robust features (SURF) algorithm and a disparity-to-plane model for a coarse-to-fine registration procedure. Once the coarse-to-fine registration task accumulates errors, the same features can appear in two different locations of the map. This is known as the loop closure problem. Then, the variance-covariance matrix that describes the uncertainty of transformation parameters (3D rotation and 3D translation) for view-based loop closure detection followed by a graph-based optimization are proposed to achieve a 3D consistent indoor map. To demonstrate and evaluate the effectiveness of the proposed method, experimental datasets obtained in three indoor environments with different levels of details are used. The experimental results shown that the proposed framework can create 3D indoor maps with an error of 11,97 cm into object space that corresponds to a positional imprecision around 1,5% at the distance of 9 m travelled by sensor.

Download Full-text

Harnessing GPU Tensor Cores for Fast FP16 Arithmetic to Speed up Mixed-Precision Iterative Refinement Solvers

SC18: International Conference for High Performance Computing, Networking, Storage and Analysis ◽

10.1109/sc.2018.00050 ◽

2018 ◽

Cited By ~ 28

Author(s):

Azzam Haidar ◽

Stanimire Tomov ◽

Jack Dongarra ◽

Nicholas J. Higham

Keyword(s):

Iterative Refinement ◽

Mixed Precision ◽

Speed Up

Download Full-text

A Guiding Principles for Choosing Numerical Precision in Atmospheric Model based on CESM

10.5194/egusphere-egu2020-13445 ◽

2020 ◽

Author(s):

Jiayi Lai

Keyword(s):

Climate Models ◽

Spatial Scales ◽

Atmospheric Model ◽

Model Complexity ◽

Double Precision ◽

Atmospheric Models ◽

Weather Model ◽

Mixed Precision ◽

Numerical Precision ◽

Discrimination Method

The next generation of weather and climate models will have an unprecedented level of resolution and model complexity, while also increasing the requirements for calculation and memory speed. Reducing the accuracy of certain variables and using mixed precision methods in atmospheric models can greatly improve Computing and memory speed. However, in order to ensure the accuracy of the results, most models have over-designed numerical accuracy, which results in that occupied resources have being much larger than the required resources. Previous studies have shown that the necessary precision for an accurate weather model has clear scale dependence, with large spatial scales requiring higher precision than small scales. Even at large scales the necessary precision is far below that of double precision. However, it is difficult to find a guided method to assign different precisions to different variables, so that it can save unnecessary waste. This paper will take CESM1.2.1 as a research object to conduct a large number of tests to reduce accuracy, and propose a new discrimination method similar to the CFL criterion. This method can realize the correlation verification of a single variable, thereby determining which variables can use a lower level of precision without degrading the accuracy of the results.

Download Full-text

Computational strategies for reducing annotation effort in language documentation

Linguistic Issues in Language Technology ◽

10.33011/lilt.v3i.1217 ◽

2010 ◽

Vol 3 ◽

Author(s):

Alexis Palmer ◽

Taesun Moon ◽

Jason Baldridge ◽

Katrin Erk ◽

Eric Campbell ◽

...

Keyword(s):

Computational Linguistics ◽

Manual Annotation ◽

Limited Time ◽

Language Documentation ◽

Morphological Segmentation ◽

Wide Range ◽

Speed Up ◽

Further Development ◽

Different Levels ◽

Promising Avenue

With the urgent need to document the world's dying languages, it is important to explore ways to speed up language documentation efforts. One promising avenue is to use techniques from computational linguistics to automate some of the process. Here we consider unsupervised morphological segmentation and active learning for creating interlinear glossed text (IGT) for the Mayan language Uspanteko. The practical goal is to produce a totally annotated corpus that is as accurate as possible given limited time for manual annotation. We discuss results from several experiments that suggest there is indeed much promise in these methods but also show that further development is necessary to make them robustly useful for a wide range of conditions and tasks. We also provide a detailed discussion of how two documentary linguists perceived machine support in IGT production and how their annotation performance varied with different levels of machine support.

Download Full-text

A Review of Tools Applied in Processing of Medical Images

Journal of Biomedical and Sustainable Healthcare Applications ◽

10.53759/0088/jbsha202101006 ◽

2021 ◽

pp. 42-49

Author(s):

Anandakumar Haldorai ◽

Shrinand Anandakumar

Keyword(s):

Medical Imaging ◽

Operating Systems ◽

Medical Image ◽

Biomedical Imaging ◽

Software Tool ◽

Detailed Examination ◽

Speed Up ◽

Different Levels ◽

Therapy Treatment ◽

Analyze Data

The segmentation step of therapy treatment includes a detailed examination of medical imaging. In diagnosis, clinical research, and patient management, medical pictures are mainly utilized as radiographic methods. Image processing software for medical imaging is also crucial. It is possible to improve and speed up the analysis of a medical picture using a bioMIP technique. This article presents a biomedical imaging software tool that aims to provide a similar level of programmability while investigating pipelined processor solutions. These tools mimic entire systems made up of many of the recommended processing segment within the setups categorized by the schematic framework. In this paper, 15 biomedical imaging technologies will be evaluated on a number of different levels. The comparison's primary goal is to collect and analyze data in order to suggest which medical image program should be used when analyzing various kinds of imaging to users of various operating systems. The article included a result table that was reviewed.

Download Full-text

A novel parallel finite element procedure for nonlinear dynamic problems using GPU and mixed-precision algorithm

Engineering Computations ◽

10.1108/ec-07-2019-0328 ◽

2020 ◽

Vol 37 (6) ◽

pp. 2193-2211 ◽

Cited By ~ 3

Author(s):

Shengquan Wang ◽

Chao Wang ◽

Yong Cai ◽

Guangyao Li

Keyword(s):

Finite Element ◽

Parallel Computing ◽

Nonlinear Dynamic ◽

Shell Element ◽

Double Precision ◽

Dynamic Problems ◽

Content Type ◽

Computational Speed ◽

Mixed Precision ◽

Practical Engineering

Purpose The purpose of this paper is to improve the computational speed of solving nonlinear dynamics by using parallel methods and mixed-precision algorithm on graphic processing units (GPUs). The computational efficiency of traditional central processing units (CPUs)-based computer aided engineering software has been difficult to satisfy the needs of scientific research and practical engineering, especially for nonlinear dynamic problems. Besides, when calculations are performed on GPUs, double-precision operations are slower than single-precision operations. So this paper implemented mixed precision for nonlinear dynamic problem simulation using Belytschko-Tsay (BT) shell element on GPU. Design/methodology/approach To minimize data transfer between heterogeneous architectures, the parallel computation of the fully explicit finite element (FE) calculation is realized using a vectorized thread-level parallelism algorithm. An asynchronous data transmission strategy and a novel dependency relationship link-based method, for efficiently solving parallel explicit shell element equations, are used to improve the GPU utilization ratio. Finally, this paper implements mixed precision for nonlinear dynamic problems simulation using the BT shell element on a GPU and compare it to the CPU-based serially executed program and a GPU-based double-precision parallel computing program. Findings For a car body model containing approximately 5.3 million degrees of freedom, the computational speed is improved 25 times over CPU sequential computation, and approximately 10% over double-precision parallel computing method. The accuracy error of the mixed-precision computation is small and can satisfy the requirements of practical engineering problems. Originality/value This paper realized a novel FE parallel computing procedure for nonlinear dynamic problems using mixed-precision algorithm on CPU-GPU platform. Compared with the CPU serial program, the program implemented in this article obtains a 25 times acceleration ratio when calculating the model of 883,168 elements, which greatly improves the calculation speed for solving nonlinear dynamic problems.

Download Full-text

An evaluation of the mixed precision version of NEMO 4.0.1

10.5194/egusphere-egu2020-16204 ◽

2020 ◽

Author(s):

Oriol Tintó ◽

Stella Valentina Paronuzzi Ticco ◽

Mario C. Acosta ◽

Miguel Castrillo ◽

Kim Serradell ◽

...

Keyword(s):

Environmental Impact ◽

Computational Models ◽

The Other ◽

Double Precision ◽

Single Precision ◽

Computational Performance ◽

Mixed Precision ◽

Scientific Results ◽

Numerical Precision

One of the requirements to keep improving the science produced using NEMO is to enhance its computational performance. The interest in improving its capability to efficiently use the computational infrastructure its two-fold: on one side there are experiments that would only be possible if a certain threshold of throughput is achieved, on the other side any development that achieves an increase in efficiency would help saving resources while reducing the environmental impact of our experiments. One of the opportunities that raised interest in the last few years is the optimization of the numerical precision. Historical reasons brought many computational models to over-engineer the numerical precision: solving this miss-adjustment can payback in terms of efficiency and throughput. In this direction, a research was carried out in order to safely reduce the numerical precision in NEMO which led to a mixed-precision version of the model. The implementation has been developed following the approach proposed by Tint&#243; et al. 2019, in which the variables that require double precision are identified automatically and the remaining ones are switched to use single-precision. The implementation will be released in 2020 and this work presents its evaluation in terms of both performance and scientific results.

Download Full-text