scholarly journals PhotoNs-GPU: A GPU accelerated cosmological simulation code

2021 ◽  
Vol 21 (11) ◽  
pp. 281
Author(s):  
Qiao Wang ◽  
Chen Meng

Abstract We present a GPU-accelerated cosmological simulation code, PhotoNs-GPU, based on an algorithm of Particle Mesh Fast Multipole Method (PM-FMM), and focus on the GPU utilization and optimization. A proper interpolated method for truncated gravity is introduced to speed up the special functions in kernels. We verify the GPU code in mixed precision and different levels of theinterpolated method on GPU. A run with single precision is roughly two times faster than double precision for current practical cosmological simulations. But it could induce an unbiased small noise in power spectrum. Compared with the CPU version of PhotoNs and Gadget-2, the efficiency of the new code is significantly improved. Activated all the optimizations on the memory access, kernel functions and concurrency management, the peak performance of our test runs achieves 48% of the theoretical speed and the average performance approaches to ∼35% on GPU.

2020 ◽  
Author(s):  
Alessandro Cotronei ◽  
Thomas Slawig

Abstract. We converted the radiation part of the atmospheric model ECHAM to single precision arithmetic. We analyzed different conversion strategies and finally used a step by step change of all modules, subroutines and functions. We found out that a small code portion still requires higher precision arithmetic. We generated code that can be easily changed from double to single precision and vice versa, basically using a simple switch in one module. We compared the output of the single precision version in the coarse resolution with observational data and with the original double precision code. The results of both versions are comparable. We extensively tested different parallelization options with respect to the possible performance gain, in both coarse and low resolution. The single precision radiation itself was accelerated by about 40%, whereas the speed-up for the whole ECHAM model using the converted radiation achieved 18% in the best configuration. We further measured the energy consumption, which could also be reduced.


2007 ◽  
Vol 1 (1) ◽  
pp. 41-76 ◽  
Author(s):  
R. Greve ◽  
S. Otsu

Abstract. The north-east Greenland ice stream (NEGIS) was discovered as a large fast-flow feature of the Greenland ice sheet by synthetic aperture radar (SAR) imaginary of the ERS-1 satellite. In this study, the NEGIS is implemented in the dynamic/thermodynamic, large-scale ice-sheet model SICOPOLIS (Simulation Code for POLythermal Ice Sheets). In the first step, we simulate the evolution of the ice sheet on a 10-km grid for the period from 250 ka ago until today, driven by a climatology reconstructed from a combination of present-day observations and GCM results for the past. We assume that the NEGIS area is characterized by enhanced basal sliding compared to the "normal", slowly-flowing areas of the ice sheet, and find that the misfit between simulated and observed ice thicknesses and surface velocities is minimized for a sliding enhancement by the factor three. In the second step, the consequences of the NEGIS, and also of surface-meltwater-induced acceleration of basal sliding, for the possible decay of the Greenland ice sheet in future warming climates are investigated. It is demonstrated that the ice sheet is generally very susceptible to global warming on time-scales of centuries and that surface-meltwater-induced acceleration of basal sliding can speed up the decay significantly, whereas the NEGIS is not likely to dynamically destabilize the ice sheet as a whole.


2014 ◽  
Vol 2014 ◽  
pp. 1-10 ◽  
Author(s):  
D. Z. Ding ◽  
G. M. Li ◽  
Y. Y. An ◽  
R. S. Chen

The higher-order hierarchical Legendre basis functions combining the electrical field integral equations (EFIE) are developed to solve the scattering problems from the rough surface. The hierarchical two-level spectral preconditioning method is developed for the generalized minimal residual iterative method (GMRES). The hierarchical two-level spectral preconditioner is constructed by combining the spectral preconditioner and sparse approximate inverse (SAI) preconditioner to speed up the convergence rate of iterative methods. The multilevel fast multipole method (MLFMM) is employed to reduce memory requirement and computational complexity of the method of moments (MoM) solution. The accuracy and efficiency are confirmed with a couple of numerical examples.


2018 ◽  
Vol 24 (3) ◽  
pp. 351-366
Author(s):  
Marcos Aurélio Basso ◽  
Daniel Rodrigues dos Santos

Abstract In this paper, we present a method for 3D mapping of indoor environments using RGB-D data. The contribution of our proposed method is two-fold. First, our method exploits a joint effort of the speed-up robust features (SURF) algorithm and a disparity-to-plane model for a coarse-to-fine registration procedure. Once the coarse-to-fine registration task accumulates errors, the same features can appear in two different locations of the map. This is known as the loop closure problem. Then, the variance-covariance matrix that describes the uncertainty of transformation parameters (3D rotation and 3D translation) for view-based loop closure detection followed by a graph-based optimization are proposed to achieve a 3D consistent indoor map. To demonstrate and evaluate the effectiveness of the proposed method, experimental datasets obtained in three indoor environments with different levels of details are used. The experimental results shown that the proposed framework can create 3D indoor maps with an error of 11,97 cm into object space that corresponds to a positional imprecision around 1,5% at the distance of 9 m travelled by sensor.


2020 ◽  
Author(s):  
Jiayi Lai

<p><span>The next generation of weather and climate models will have an unprecedented level of resolution and model complexity, while also increasing the requirements for calculation and memory speed. Reducing the accuracy of certain variables and using mixed precision methods in atmospheric models can greatly improve Computing and memory speed. However, in order to ensure the accuracy of the results, most models have over-designed numerical accuracy, which results in that occupied resources have being much larger than the required resources. Previous studies have shown that the necessary precision for an accurate weather model has clear scale dependence, with large spatial scales requiring higher precision than small scales. Even at large scales the necessary precision is far below that of double precision. However, it is difficult to find a guided method to assign different precisions to different variables, so that it can save unnecessary waste. This paper will take CESM1.2.1 as a research object to conduct a large number of tests to reduce accuracy, and propose a new discrimination method similar to the CFL criterion. This method can realize the correlation verification of a single variable, thereby determining which variables can use a lower level of precision without degrading the accuracy of the results.</span></p>


2010 ◽  
Vol 3 ◽  
Author(s):  
Alexis Palmer ◽  
Taesun Moon ◽  
Jason Baldridge ◽  
Katrin Erk ◽  
Eric Campbell ◽  
...  

With the urgent need to document the world's dying languages, it is important to explore ways to speed up language documentation efforts. One promising avenue is to use techniques from computational linguistics to automate some of the process. Here we consider unsupervised morphological segmentation and active learning for creating interlinear glossed text (IGT) for the Mayan language Uspanteko. The practical goal is to produce a totally annotated corpus that is as accurate as possible given limited time for manual annotation. We discuss results from several experiments that suggest there is indeed much promise in these methods but also show that further development is necessary to make them robustly useful for a wide range of conditions and tasks. We also provide a detailed discussion of how two documentary linguists perceived machine support in IGT production and how their annotation performance varied with different levels of machine support.


Author(s):  
Anandakumar Haldorai ◽  
Shrinand Anandakumar

The segmentation step of therapy treatment includes a detailed examination of medical imaging. In diagnosis, clinical research, and patient management, medical pictures are mainly utilized as radiographic methods. Image processing software for medical imaging is also crucial. It is possible to improve and speed up the analysis of a medical picture using a bioMIP technique. This article presents a biomedical imaging software tool that aims to provide a similar level of programmability while investigating pipelined processor solutions. These tools mimic entire systems made up of many of the recommended processing segment within the setups categorized by the schematic framework. In this paper, 15 biomedical imaging technologies will be evaluated on a number of different levels. The comparison's primary goal is to collect and analyze data in order to suggest which medical image program should be used when analyzing various kinds of imaging to users of various operating systems. The article included a result table that was reviewed.


2020 ◽  
Vol 37 (6) ◽  
pp. 2193-2211 ◽  
Author(s):  
Shengquan Wang ◽  
Chao Wang ◽  
Yong Cai ◽  
Guangyao Li

Purpose The purpose of this paper is to improve the computational speed of solving nonlinear dynamics by using parallel methods and mixed-precision algorithm on graphic processing units (GPUs). The computational efficiency of traditional central processing units (CPUs)-based computer aided engineering software has been difficult to satisfy the needs of scientific research and practical engineering, especially for nonlinear dynamic problems. Besides, when calculations are performed on GPUs, double-precision operations are slower than single-precision operations. So this paper implemented mixed precision for nonlinear dynamic problem simulation using Belytschko-Tsay (BT) shell element on GPU. Design/methodology/approach To minimize data transfer between heterogeneous architectures, the parallel computation of the fully explicit finite element (FE) calculation is realized using a vectorized thread-level parallelism algorithm. An asynchronous data transmission strategy and a novel dependency relationship link-based method, for efficiently solving parallel explicit shell element equations, are used to improve the GPU utilization ratio. Finally, this paper implements mixed precision for nonlinear dynamic problems simulation using the BT shell element on a GPU and compare it to the CPU-based serially executed program and a GPU-based double-precision parallel computing program. Findings For a car body model containing approximately 5.3 million degrees of freedom, the computational speed is improved 25 times over CPU sequential computation, and approximately 10% over double-precision parallel computing method. The accuracy error of the mixed-precision computation is small and can satisfy the requirements of practical engineering problems. Originality/value This paper realized a novel FE parallel computing procedure for nonlinear dynamic problems using mixed-precision algorithm on CPU-GPU platform. Compared with the CPU serial program, the program implemented in this article obtains a 25 times acceleration ratio when calculating the model of 883,168 elements, which greatly improves the calculation speed for solving nonlinear dynamic problems.


2020 ◽  
Author(s):  
Oriol Tintó ◽  
Stella Valentina Paronuzzi Ticco ◽  
Mario C. Acosta ◽  
Miguel Castrillo ◽  
Kim Serradell ◽  
...  

<p>One of the requirements to keep improving the science produced using NEMO is to enhance its computational performance. The interest in improving its capability to efficiently use the computational infrastructure its two-fold: on one side there are experiments that would only be possible if a certain threshold of throughput is achieved, on the other side any development that achieves an increase in efficiency would help saving resources while reducing the environmental impact of our experiments. One of the opportunities that raised interest in the last few years is the optimization of the numerical precision. Historical reasons brought many computational models to over-engineer the numerical precision: solving this miss-adjustment can payback in terms of efficiency and throughput. In this direction, a research was carried out in order to safely reduce the numerical precision in NEMO which led to a mixed-precision version of the model. The implementation has been developed following the approach proposed by Tintó et al. 2019, in which the variables that require double precision are identified automatically and the remaining ones are switched to use single-precision. The implementation will be released in 2020 and this work presents its evaluation in terms of both performance and scientific results.</p>


Sign in / Sign up

Export Citation Format

Share Document