message passing interface
Recently Published Documents


TOTAL DOCUMENTS

478
(FIVE YEARS 119)

H-INDEX

24
(FIVE YEARS 7)

Author(s):  
Gabriel Espiñeira ◽  
Antonio J. García-Loureiro ◽  
Natalia Seoane

AbstractIn the current technology node, purely classical numerical simulators lack the precision needed to obtain valid results. At the same time, the simulation of fully quantum models can be a cumbersome task in certain studies such as device variability analysis, since a single simulation can take up to weeks to compute and hundreds of device configurations need to be analyzed to obtain statistically significative results. A good compromise between fast and accurate results is to add corrections to the classical simulation that are able to reproduce the quantum nature of matter. In this context, we present a new approach of Schrödinger equation-based quantum corrections. We have implemented it using Message Passing Interface in our in-house built semiconductor simulation framework called VENDES, capable of running in distributed systems that allow for more accurate results in a reasonable time frame. Using a 12-nm-gate-length gate-all-around nanowire FET (GAA NW FET) as a benchmark device, the new implementation shows an almost perfect agreement in the output data with less than a 2% difference between the cases using 1 and 16 processes. Also, a reduction of up to 98% in the computational time has been found comparing the sequential and the 16 process simulation. For a reasonably dense mesh of 150k nodes, a variability study of 300 individual simulations can be now performed with VENDES in approximately 2.5 days instead of an estimated sequential execution of 137 days.


Author(s):  
Aaron Young ◽  
Jay Taves ◽  
Asher Elmquist ◽  
Simone Benatti ◽  
Alessandro Tasora ◽  
...  

Abstract We describe a simulation environment that enables the design and testing of control policies for off-road mobility of autonomous agents. The environment is demonstrated in conjunction with the training and assessment of a reinforcement learning policy that uses sensor fusion and inter-agent communication to enable the movement of mixed convoys of human-driven and autonomous vehicles. Policies learned on rigid terrain are shown to transfer to hard (silt-like) and soft (snow-like) deformable terrains. The environment described performs the following: multi-vehicle multibody dynamics co-simulation in a time/space-coherent infrastructure that relies on the Message Passing Interface standard for low-latency parallel computing; sensor simulation (e.g., camera, GPU, IMU); simulation of a virtual world that can be altered by the agents present in the simulation; training that uses reinforcement learning to 'teach' the autonomous vehicles to drive in an obstacle-riddled course. The software stack described is open source.


2021 ◽  
Vol 14 (12) ◽  
pp. 7477-7495
Author(s):  
Rafael Lago ◽  
Thomas Gastine ◽  
Tilman Dannert ◽  
Markus Rampp ◽  
Johannes Wicht

Abstract. We discuss two parallelization schemes for MagIC, an open-source, high-performance, pseudo-spectral code for the numerical solution of the magnetohydrodynamics equations in a rotating spherical shell. MagIC calculates the non-linear terms on a numerical grid in spherical coordinates, while the time step updates are performed on radial grid points with a spherical harmonic representation of the lateral directions. Several transforms are required to switch between the different representations. The established hybrid parallelization of MagIC uses message-passing interface (MPI) distribution in radius and relies on existing fast spherical transforms using OpenMP. Our new two-dimensional MPI decomposition implementation also distributes the latitudes or the azimuthal wavenumbers across the available MPI tasks and compute cores. We discuss several non-trivial algorithmic optimizations and the different data distribution layouts employed by our scheme. In particular, the two-dimensional distribution data layout yields a code that strongly scales well beyond the limit of the current one-dimensional distribution. We also show that the two-dimensional distribution implementation, although not yet fully optimized, can already be faster than the existing finely optimized hybrid parallelization when using many thousands of CPU cores. Our analysis indicates that the two-dimensional distribution variant can be further optimized to also surpass the performance of the one-dimensional distribution for a few thousand cores.


Processes ◽  
2021 ◽  
Vol 9 (11) ◽  
pp. 1980
Author(s):  
Lihua Shen ◽  
Hui Liu ◽  
Zhangxin Chen

In this paper, the deterministic ensemble Kalman filter is implemented with a parallel technique of the message passing interface based on our in-house black oil simulator. The implementation is separated into two cases: (1) the ensemble size is greater than the processor number and (2) the ensemble size is smaller than or equal to the processor number. Numerical experiments for estimations of three-phase relative permeabilities represented by power-law models with both known endpoints and unknown endpoints are presented. It is shown that with known endpoints, good estimations can be obtained. With unknown endpoints, good estimations can still be obtained using more observations and a larger ensemble size. Computational time is reported to show that the run time is greatly reduced with more CPU cores. The MPI speedup is over 70% for a small ensemble size and 77% for a large ensemble size with up to 640 CPU cores.


Water ◽  
2021 ◽  
Vol 13 (21) ◽  
pp. 3122
Author(s):  
Leonardo Primavera ◽  
Emilia Florio

The possibility to create a flood wave in a river network depends on the geometric properties of the river basin. Among the models that try to forecast the Instantaneous Unit Hydrograph (IUH) of rainfall precipitation, the so-called Multifractal Instantaneous Unit Hydrograph (MIUH) by De Bartolo et al. (2003) rather successfully connects the multifractal properties of the river basin to the observed IUH. Such properties can be assessed through different types of analysis (fixed-size algorithm, correlation integral, fixed-mass algorithm, sandbox algorithm, and so on). The fixed-mass algorithm is the one that produces the most precise estimate of the properties of the multifractal spectrum that are relevant for the MIUH model. However, a disadvantage of this method is that it requires very long computational times to produce the best possible results. In a previous work, we proposed a parallel version of the fixed-mass algorithm, which drastically reduced the computational times almost proportionally to the number of Central Processing Unit (CPU) cores available on the computational machine by using the Message Passing Interface (MPI), which is a standard for distributed memory clusters. In the present work, we further improved the code in order to include the use of the Open Multi-Processing (OpenMP) paradigm to facilitate the execution and improve the computational speed-up on single processor, multi-core workstations, which are much more common than multi-node clusters. Moreover, the assessment of the multifractal spectrum has also been improved through a direct computation method. Currently, to the best of our knowledge, this code represents the state-of-the-art for a fast evaluation of the multifractal properties of a river basin, and it opens up a new scenario for an effective flood forecast in reasonable computational times.


2021 ◽  
Author(s):  
Giorgio Micaletto ◽  
Ivano Barletta ◽  
Silvia Mocavero ◽  
Ivan Federico ◽  
Italo Epicoco ◽  
...  

Abstract. This paper presents the MPI-based parallelization of the three-dimensional hydrodynamic model SHYFEM (System of HydrodYnamic Finite Element Modules). The original sequential version of the code was parallelized in order to reduce the execution time of high-resolution configurations using state-of-the-art HPC systems. A distributed memory approach was used, based on the message passing interface (MPI). Optimized numerical libraries were used to partition the unstructured grid (with a focus on load balancing) and to solve the sparse linear system of equations in parallel in the case of semi-to-fully implicit time stepping. The parallel implementation of the model was validated by comparing the outputs with those obtained from the sequential version. The performance assessment demonstrates a good level of scalability with a realistic configuration used as benchmark.


2021 ◽  
Vol 14 (10) ◽  
pp. 6541-6569
Author(s):  
Phillip D. Alderman

Abstract. The Decision Support System for Agrotechnology Transfer Cropping Systems Model (DSSAT-CSM) is a widely used crop modeling system that has been integrated into large-scale modeling frameworks. Existing frameworks generate spatially explicit simulated outputs at grid points through an inefficient process of translation from binary spatially referenced inputs to point-specific text input files, followed by translation and aggregation back from point-specific text output files to binary spatially referenced outputs. The main objective of this paper was to document the design and implementation of a parallel gridded simulation framework for DSSAT-CSM. A secondary objective was to provide preliminary analysis of execution time and scaling of the new parallel gridded framework. The parallel gridded framework includes improved code for model-internal data transfer, gridded input–output with the Network Common Data Form (NetCDF) library, and parallelization of simulations using the Message Passing Interface (MPI). Validation simulations with the DSSAT-CSM-CROPSIM-CERES-Wheat model revealed subtle discrepancies in simulated yield due to the rounding of soil parameters in the input routines of the standard DSSAT-CSM. Utilizing NetCDF for direct input–output produced a 3.7- to 4-fold reduction in execution time compared to R- and text-based input–output. Parallelization improved execution time for both versions with between 12.2- (standard version) and 13.4-fold (parallel gridded version) speed-up when comparing 1 to 16 compute cores. Estimates of parallelization of computation ranged between 99.2 % (standard version) and 97.3 % (parallel gridded version), indicating potential for scaling to higher numbers of compute cores.


2021 ◽  
Author(s):  
Kotaro Anno ◽  
George J. Moridis ◽  
Thomas A. Blasingame

Abstract The objectives of this study are to develop (a) the Julia Flow and Transport Simulator (JFTS), a serial and parallel, high performance non-isothermal, multi-phase, multi-component general simulator of flow and transport through porous/fractured media, and (b) an associated module that describes quantitatively the Equation-of-State (EOS) of the complete H2O+CH4 system by covering all combinations of phase coexistence that are possible in geologic media and including all the regions of the phase diagram that involve CH4-hydrates. The resulting simulator (hereafter referred to as the JFTS+H code) can describe all possible scenarios of hydrate occurrence, dissociation and formation/evolution and is to be used for the investigation of problems of (a) gas production from natural CH4-hydrate accumulations in geologic media, as well as for (b) the analysis of any laboratory experiments involving CH4-hydrates. As indicated by the JFTS name, this simulator is written in the Julia programming language and its parallelization is based on the Message Passing Interface (MPI) approach. The JFTS+H simulator is a fully-implicit, Jacobian-based compositional simulator that describes the accumulation, flow and transport of heat, and up to four mass components (H2O, CH4, CH4-hydrate and a water-soluble inhibitor) distributed among four possible phases (aqueous, gas, hydrate, and ice) in complex 3D geologic systems. The dissociation and formation of CH4-hydrates can be described using either an equilibrium or a kinetic model. The automatic derivate capability of Julia greatly simplifies and enhances the Jacobian computations. The MPI Interface (Blaise, 2019) is implemented in all components of the code, and the METIS library (Karypis, 2013) is used for the domain decomposition needed for the effective parallelization of the solution of the Jacobian matrix equation that is accomplished using the LIS library (Nishida, 2010) of parallel Conjugate Gradient solvers for large systems of simultaneous linear equations. The JFTS+H code can model the fluid flow, thermal and geochemical processes associated with the formation and dissociation of CH4-hydrates in geological media, either in laboratory or in natural hydrate accumulations. This code can simulate any combination of the three possible gas hydrate dissociation methods (depressurization, thermal stimulation, and inhibitor effects), and computes all associated parameters describing the system behavior. The JFTS+H results show very good agreement with solutions of standard reference problems, and of large 2D and 3D problems obtained from another well-established and widely used numerical simulator. The code exploits the speed, computational efficiency and low memory requirements of the Julia programming language. The parallel architecture of JFTS+H addresses the persistent problem of very large computational demands in serial hydrate simulations by using multiple processors to reduce the overall execution time and achieve scalable speedups. The code minimizes communications between processors and maximizes computations within the same computational node, which has important consequences (especially when coupled with the automatic derivative capabilities of Julia) on performance in the development of the Jacobian matrix. An optimal LIS solver is recommended for this type of problem after evaluating different options. This approach provides both speedup and computational efficiency results when different numbers of processors are called in the solution process. This work is believed to be the first application of Julia (a new, highly efficient language designed for demanding scientific computations) to create a simulator for flow and transport in porous media. JFTS+H is a fast, robust parallel simulator that uses the most recent scientific advances to account for all known processes in a dynamic hydrate system and works seamlessly on any computational platform (from laptop computers to workstations, to clusters and supercomputers with thousands of processors).


Author(s):  
Indar Sugiarto ◽  
Doddy Prayogo ◽  
Henry Palit ◽  
Felix Pasila ◽  
Resmana Lim ◽  
...  

This paper describes a prototype of a computing platform dedicated to artificial intelligence explorations. The platform, dubbed as PakCarik, is essentially a high throughput computing platform with GPU (graphics processing units) acceleration. PakCarik is an Indonesian acronym for Platform Komputasi Cerdas Ramah Industri Kreatif, which can be translated as “Creative Industry friendly Intelligence Computing Platform”. This platform aims to provide complete development and production environment for AI-based projects, especially to those that rely on machine learning and multiobjective optimization paradigms. The method for constructing PakCarik was based on a computer hardware assembling technique that uses commercial off-the-shelf hardware and was tested on several AI-related application scenarios. The testing methods in this experiment include: high-performance lapack (HPL) benchmarking, message passing interface (MPI) benchmarking, and TensorFlow (TF) benchmarking. From the experiment, the authors can observe that PakCarik's performance is quite similar to the commonly used cloud computing services such as Google Compute Engine and Amazon EC2, even though falls a bit behind the dedicated AI platform such as Nvidia DGX-1 used in the benchmarking experiment. Its maximum computing performance was measured at 326 Gflops. The authors conclude that PakCarik is ready to be deployed in real-world applications and it can be made even more powerful by adding more GPU cards in it.


Sign in / Sign up

Export Citation Format

Share Document