LOC program for line radiative transfer

Context. Radiative transfer (RT) modelling is part of many astrophysical simulations. It is used to make synthetic observations and to assist the analysis of observations. We concentrate on modelling the radio lines emitted by the interstellar medium. In connection with high-resolution models, this can be a significant computationally challenge. Aims. Our aim is to provide a line RT program that makes good use of multi-core central processing units (CPUs) and graphics processing units (GPUs). Parallelisation is essential to speed up computations and to enable large modelling tasks with personal computers. Methods. The program LOC is based on ray-tracing (i.e. not Monte Carlo) and uses standard accelerated lambda iteration methods for faster convergence. The program works on 1D and 3D grids. The 1D version makes use of symmetries to speed up the RT calculations. The 3D version works with octree grids, and to enable calculations with large models, is optimised for low memory usage. Results. Tests show that LOC results agree with other RT codes to within ∼2%. This is typical of code-to-code differences, which are often related to different interpretations of the model set-up. LOC run times compare favourably especially with those of Monte Carlo codes. In 1D tests, LOC runs were faster by up to a factor ∼20 on a GPU than on a single CPU core. In spite of the complex path calculations, a speed-up of up to ∼10 was also observed for 3D models using octree discretisation. GPUs enable calculations of models with hundreds of millions of cells, as are encountered in the context of large-scale simulations of interstellar clouds. Conclusions. LOC shows good performance and accuracy and is able to handle many RT modelling tasks on personal computers. It is written in Python, with only the computing-intensive parts implemented as compiled OpenCL kernels. It can therefore also a serve as a platform for further experimentation with alternative RT implementation details.

Download Full-text

SOC program for dust continuum radiative transfer

Astronomy and Astrophysics ◽

10.1051/0004-6361/201834354 ◽

2019 ◽

Vol 622 ◽

pp. A79 ◽

Cited By ~ 6

Author(s):

Mika Juvela

Keyword(s):

Monte Carlo ◽

Radiative Transfer ◽

Convergence Rates ◽

Dust Emission ◽

Interstellar Clouds ◽

Physical Conditions ◽

Cloud Models ◽

Order Of Magnitude ◽

Speed Up ◽

Run Time

Context. Thermal dust emission carries information on physical conditions and dust properties in many astronomical sources. Because observations represent a sum of emission along the line of sight, their interpretation often requires radiative transfer (RT) modelling. Aims. We describe a new RT program, SOC, for computations of dust emission, and examine its performance in simulations of interstellar clouds with external and internal heating. Methods. SOC implements the Monte Carlo RT method as a parallel program for shared-memory computers. It can be used to study dust extinction, scattering, and emission. We tested SOC with realistic cloud models and examined the convergence and noise of the dust-temperature estimates and of the resulting surface-brightness maps. Results. SOC has been demonstrated to produce accurate estimates for dust scattering and for thermal dust emission. It performs well with both CPUs and GPUs, the latter providing a speed-up of processing time by up to an order of magnitude. In the test cases, accelerated lambda iterations (ALIs) improved the convergence rates but was also sensitive to Monte Carlo noise. Run-time refinement of the hierarchical-grid models did not help in reducing the run times required for a given accuracy of solution. The use of a reference field, without ALI, works more robustly, and also allows the run time to be optimised if the number of photon packages is increased only as the iterations progress. Conclusions. The use of GPUs in RT computations should be investigated further.

Download Full-text

Toward large-scale Hybrid Monte Carlo simulations of the Hubbard model on graphics processing units

Computer Physics Communications ◽

10.1016/j.cpc.2011.04.014 ◽

2011 ◽

Vol 182 (8) ◽

pp. 1651-1656 ◽

Cited By ~ 5

Author(s):

Kyle A. Wendt ◽

Joaquín E. Drut ◽

Timo A. Lähde

Keyword(s):

Monte Carlo ◽

Monte Carlo Simulations ◽

Hubbard Model ◽

Graphics Processing Units ◽

Large Scale ◽

Hybrid Monte Carlo ◽

Graphics Processing

Download Full-text

High-performance computing in water resources hydrodynamics

Journal of Hydroinformatics ◽

10.2166/hydro.2020.163 ◽

2020 ◽

Vol 22 (5) ◽

pp. 1217-1235 ◽

Cited By ~ 3

Author(s):

M. Morales-Hernández ◽

M. B. Sharif ◽

S. Gangrade ◽

T. T. Dullo ◽

S.-C. Kao ◽

...

Keyword(s):

Water Resources ◽

High Performance Computing ◽

Graphics Processing Units ◽

High Performance ◽

Large Scale ◽

Test Case ◽

Processing Unit ◽

Central Processing ◽

Graphics Processing ◽

Performance Computing

Abstract This work presents a vision of future water resources hydrodynamics codes that can fully utilize the strengths of modern high-performance computing (HPC). The advances to computing power, formerly driven by the improvement of central processing unit processors, now focus on parallel computing and, in particular, the use of graphics processing units (GPUs). However, this shift to a parallel framework requires refactoring the code to make efficient use of the data as well as changing even the nature of the algorithm that solves the system of equations. These concepts along with other features such as the precision for the computations, dry regions management, and input/output data are analyzed in this paper. A 2D multi-GPU flood code applied to a large-scale test case is used to corroborate our statements and ascertain the new challenges for the next-generation parallel water resources codes.

Download Full-text

Efficient parallelization of perturbative Monte Carlo QM/MM simulations in heterogeneous platforms

The International Journal of High Performance Computing Applications ◽

10.1177/1094342016649420 ◽

2016 ◽

Vol 31 (6) ◽

pp. 499-516 ◽

Cited By ~ 1

Author(s):

Sebastião Miranda ◽

Jonas Feldt ◽

Frederico Pratas ◽

Ricardo A Mata ◽

Nuno Roma ◽

...

Keyword(s):

Monte Carlo ◽

Heterogeneous Systems ◽

Coarse Grained ◽

Molecular Systems ◽

Fine Grained ◽

Central Processing ◽

Speed Up ◽

Graphical Processing ◽

Computational Bottleneck ◽

The Cost

A novel perturbative Monte Carlo mixed quantum mechanics (QM)/molecular mechanics (MM) approach has been recently developed to simulate molecular systems in complex environments. However, the required accuracy to efficiently simulate such complex molecular systems is usually granted at the cost of long executing times. To alleviate this problem, a new parallelization strategy of multi-level Monte Carlo molecular simulations is herein proposed for heterogeneous systems. It simultaneously exploits fine-grained (at the data level), coarse-grained (at the Markov chain level) and task-grained (pure QM, pure MM and QM/MM procedures) parallelism to ensure an efficient execution in heterogeneous systems composed of central processing units and multiple and possibly different graphical processing units. This is achieved by making use of the OpenCL library, together with appropriate dynamic load balancing schemes. From the conducted evaluation with real benchmarking data, a speed-up of 56x in the computational bottleneck part was observed, which results in a global speed-up of 38x for the whole simulation, reducing the time of a typical simulation from 80 hours to only 2 hours.

Download Full-text

Large-scale simulations on multiple Graphics Processing Units (GPUs) for the direct simulation Monte Carlo method

Journal of Computational Physics ◽

10.1016/j.jcp.2012.07.038 ◽

2012 ◽

Vol 231 (23) ◽

pp. 7932-7958 ◽

Cited By ~ 23

Author(s):

C.-C. Su ◽

M.R. Smith ◽

F.-A. Kuo ◽

J.-S. Wu ◽

C.-W. Hsieh ◽

...

Keyword(s):

Monte Carlo ◽

Monte Carlo Method ◽

Graphics Processing Units ◽

Large Scale ◽

Direct Simulation Monte Carlo ◽

Direct Simulation ◽

Simulation Monte Carlo ◽

Graphics Processing ◽

Large Scale Simulations

Download Full-text

Accelerating the RTTOV-7 IASI and AMSU-A radiative transfer models on graphics processing units: evaluating central processing unit/graphics processing unit-hybrid and pure-graphics processing unit approaches

Journal of Applied Remote Sensing ◽

10.1117/1.3658028 ◽

2011 ◽

Vol 5 (1) ◽

pp. 051503 ◽

Cited By ~ 4

Author(s):

Jarno Mielikainen

Keyword(s):

Radiative Transfer ◽

Graphics Processing Units ◽

Graphics Processing Unit ◽

Central Processing Unit ◽

Processing Unit ◽

Central Processing ◽

Radiative Transfer Models ◽

Graphics Processing ◽

Transfer Models

Download Full-text

An Implicit Harmonic Balance Method in Graphics Processing Units for Oscillating Blades

Journal of Turbomachinery ◽

10.1115/1.4031918 ◽

2015 ◽

Vol 138 (3) ◽

Cited By ~ 7

Author(s):

Javier Crespo ◽

Roque Corral ◽

Jesus Pueblas

Keyword(s):

Finite Difference ◽

Harmonic Balance ◽

Graphics Processing Units ◽

Stokes Equations ◽

Computational Cost ◽

Harmonic Balance Method ◽

Transonic Compressor ◽

Central Processing ◽

Speed Up ◽

Graphics Processing

An implicit harmonic balance (HB) method for modeling the unsteady nonlinear periodic flow about vibrating airfoils in turbomachinery is presented. An implicit edge-based three-dimensional Reynolds-averaged Navier–Stokes equations (RANS) solver for unstructured grids, which runs both on central processing units (CPUs) and graphics processing units (GPUs), is used. The HB method performs a spectral discretization of the time derivatives and marches in pseudotime, a new system of equations where the unknowns are the variables at different time samples. The application of the method to vibrating airfoils is discussed. It is shown that a time-spectral scheme may achieve the same temporal accuracy at a much lower computational cost than a backward finite-difference method at the expense of using more memory. The performance of the implicit solver has been assessed with several application examples. A speed-up factor of 10 is obtained between the spectral and finite-difference version of the code, whereas an additional speed-up factor of 10 is obtained when the code is ported to GPUs, totalizing a speed factor of 100. The performance of the solver in GPUs has been assessed using the tenth standard aeroelastic configuration and a transonic compressor.

Download Full-text

Numerical Cosmology powered by GPUs

Proceedings of the International Astronomical Union ◽

10.1017/s1743921311000706 ◽

2010 ◽

Vol 6 (S270) ◽

pp. 397-400 ◽

Cited By ~ 2

Author(s):

Dominique Aubert

Keyword(s):

Radiative Transfer ◽

Graphics Processing Units ◽

Large Scale ◽

Numerical Calculations ◽

Cosmological Simulations ◽

Gpu Clusters ◽

Graphics Processing ◽

Large Scale Simulations ◽

Gpu Implementation ◽

Numerical Cosmology

AbstractGraphics Processing Units (GPUs) offer a new way to accelerate numerical calculations by means of on-board massive parallelisation. We discuss two examples of GPU implementation relevant for cosmological simulations, an N-Body Particle-mesh solver and a radiative transfer code. The latter has also been ported on multi-GPU clusters. The range of acceleration (x30-x80) achieved here offer bright perspective for large scale simulations driven by GPUs.

Download Full-text

FiCoS: A fine-grained and coarse-grained GPU-powered deterministic simulator for biochemical networks

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009410 ◽

2021 ◽

Vol 17 (9) ◽

pp. e1009410

Author(s):

Andrea Tangherloni ◽

Marco S. Nobile ◽

Paolo Cazzaniga ◽

Giulia Capitoli ◽

Simone Spolaor ◽

...

Keyword(s):

Graphics Processing Units ◽

Large Scale ◽

Biochemical Networks ◽

Coarse Grained ◽

Stiff Systems ◽

Fine Grained ◽

Central Processing ◽

Cellular Processes ◽

The One ◽

Graphics Processing

Mathematical models of biochemical networks can largely facilitate the comprehension of the mechanisms at the basis of cellular processes, as well as the formulation of hypotheses that can be tested by means of targeted laboratory experiments. However, two issues might hamper the achievement of fruitful outcomes. On the one hand, detailed mechanistic models can involve hundreds or thousands of molecular species and their intermediate complexes, as well as hundreds or thousands of chemical reactions, a situation generally occurring in rule-based modeling. On the other hand, the computational analysis of a model typically requires the execution of a large number of simulations for its calibration or to test the effect of perturbations. As a consequence, the computational capabilities of modern Central Processing Units can be easily overtaken, possibly making the modeling of biochemical networks a worthless or ineffective effort. To the aim of overcoming the limitations of the current state-of-the-art simulation approaches, we present in this paper FiCoS, a novel “black-box” deterministic simulator that effectively realizes both a fine-grained and a coarse-grained parallelization on Graphics Processing Units. In particular, FiCoS exploits two different integration methods, namely, the Dormand–Prince and the Radau IIA, to efficiently solve both non-stiff and stiff systems of coupled Ordinary Differential Equations. We tested the performance of FiCoS against different deterministic simulators, by considering models of increasing size and by running analyses with increasing computational demands. FiCoS was able to dramatically speedup the computations up to 855×, showing to be a promising solution for the simulation and analysis of large-scale models of complex biological processes.

Download Full-text

Anytime Monte Carlo

Data-Centric Engineering ◽

10.1017/dce.2021.6 ◽

2021 ◽

Vol 2 ◽

Author(s):

Lawrence M. Murray ◽

Sumeetpal S. Singh ◽

Anthony Lee

Keyword(s):

Monte Carlo ◽

Markov Chain ◽

Real Time ◽

Graphics Processing Units ◽

Large Scale ◽

Sequential Monte Carlo ◽

Time Budget ◽

Mcmc Algorithm ◽

Length Bias ◽

Markov Jump

Abstract Monte Carlo algorithms simulates some prescribed number of samples, taking some random real time to complete the computations necessary. This work considers the converse: to impose a real-time budget on the computation, which results in the number of samples simulated being random. To complicate matters, the real time taken for each simulation may depend on the sample produced, so that the samples themselves are not independent of their number, and a length bias with respect to compute time is apparent. This is especially problematic when a Markov chain Monte Carlo (MCMC) algorithm is used and the final state of the Markov chain—rather than an average over all states—is required, which is the case in parallel tempering implementations of MCMC. The length bias does not diminish with the compute budget in this case. It also occurs in sequential Monte Carlo (SMC) algorithms, which is the focus of this paper. We propose an anytime framework to address the concern, using a continuous-time Markov jump process to study the progress of the computation in real time. We first show that for any MCMC algorithm, the length bias of the final state’s distribution due to the imposed real-time computing budget can be eliminated by using a multiple chain construction. The utility of this construction is then demonstrated on a large-scale SMC $ {}^2 $ implementation, using four billion particles distributed across a cluster of 128 graphics processing units on the Amazon EC2 service. The anytime framework imposes a real-time budget on the MCMC move steps within the SMC $ {}^2 $ algorithm, ensuring that all processors are simultaneously ready for the resampling step, demonstrably reducing idleness to due waiting times and providing substantial control over the total compute budget.

Download Full-text