scholarly journals A Hybrid MPI-OpenMP Parallel Algorithm for the Assessment of the Multifractal Spectrum of River Networks

Water ◽  
2021 ◽  
Vol 13 (21) ◽  
pp. 3122
Author(s):  
Leonardo Primavera ◽  
Emilia Florio

The possibility to create a flood wave in a river network depends on the geometric properties of the river basin. Among the models that try to forecast the Instantaneous Unit Hydrograph (IUH) of rainfall precipitation, the so-called Multifractal Instantaneous Unit Hydrograph (MIUH) by De Bartolo et al. (2003) rather successfully connects the multifractal properties of the river basin to the observed IUH. Such properties can be assessed through different types of analysis (fixed-size algorithm, correlation integral, fixed-mass algorithm, sandbox algorithm, and so on). The fixed-mass algorithm is the one that produces the most precise estimate of the properties of the multifractal spectrum that are relevant for the MIUH model. However, a disadvantage of this method is that it requires very long computational times to produce the best possible results. In a previous work, we proposed a parallel version of the fixed-mass algorithm, which drastically reduced the computational times almost proportionally to the number of Central Processing Unit (CPU) cores available on the computational machine by using the Message Passing Interface (MPI), which is a standard for distributed memory clusters. In the present work, we further improved the code in order to include the use of the Open Multi-Processing (OpenMP) paradigm to facilitate the execution and improve the computational speed-up on single processor, multi-core workstations, which are much more common than multi-node clusters. Moreover, the assessment of the multifractal spectrum has also been improved through a direct computation method. Currently, to the best of our knowledge, this code represents the state-of-the-art for a fast evaluation of the multifractal properties of a river basin, and it opens up a new scenario for an effective flood forecast in reasonable computational times.

2021 ◽  
Vol 8 (2) ◽  
pp. 169-180
Author(s):  
Mark Lin ◽  
Periklis Papadopoulos

Computational methods such as Computational Fluid Dynamics (CFD) traditionally yield a single output – a single number that is much like the result one would get if one were to perform a theoretical hand calculation. However, this paper will show that computation methods have inherent uncertainty which can also be reported statistically. In numerical computation, because many factors affect the data collected, the data can be quoted in terms of standard deviations (error bars) along with a mean value to make data comparison meaningful. In cases where two data sets are obscured by uncertainty, the two data sets are said to be indistinguishable. A sample CFD problem pertaining to external aerodynamics is copied and ran on 29 identical computers in a university computer lab. The expectation is that all 29 runs should return exactly the same result; unfortunately, in a few cases the result turns out to be different. This is attributed to the parallelization scheme which partitions the mesh to run in parallel on multiple cores of the computer. The distribution of the computational load is hardware-driven depending on the available resource of each computer at the time. Things, such as load-balancing among multiple Central Processing Unit (CPU) cores using Message Passing Interface (MPI) are transparent to the user. Software algorithm such as METIS or JOSTLE is used to automatically divide up the load between different processors. As such, the user has no control over the outcome of the CFD calculation even when the same problem is computed. Because of this, numerical uncertainty arises from parallel (multicore) computing. One way to resolve this issue is to compute problems using a single core, without mesh repartitioning. However, as this paper demonstrates even this is not straight forward. Keywords: numerical uncertainty, parallelization, load-balancing, automotive aerodynamics


2014 ◽  
Vol 2014 ◽  
pp. 1-8 ◽  
Author(s):  
Ronglin Jiang ◽  
Shugang Jiang ◽  
Yu Zhang ◽  
Ying Xu ◽  
Lei Xu ◽  
...  

This paper introduces a (finite difference time domain) FDTD code written in Fortran and CUDA for realistic electromagnetic calculations with parallelization methods of Message Passing Interface (MPI) and Open Multiprocessing (OpenMP). Since both Central Processing Unit (CPU) and Graphics Processing Unit (GPU) resources are utilized, a faster execution speed can be reached compared to a traditional pure GPU code. In our experiments, 64 NVIDIA TESLA K20m GPUs and 64 INTEL XEON E5-2670 CPUs are used to carry out the pure CPU, pure GPU, and CPU + GPU tests. Relative to the pure CPU calculations for the same problems, the speedup ratio achieved by CPU + GPU calculations is around 14. Compared to the pure GPU calculations for the same problems, the CPU + GPU calculations have 7.6%–13.2% performance improvement. Because of the small memory size of GPUs, the FDTD problem size is usually very small. However, this code can enlarge the maximum problem size by 25% without reducing the performance of traditional pure GPU code. Finally, using this code, a microstrip antenna array with16×18elements is calculated and the radiation patterns are compared with the ones of MoM. Results show that there is a well agreement between them.


SPE Journal ◽  
2014 ◽  
Vol 19 (04) ◽  
pp. 716-725 ◽  
Author(s):  
Larry S.K. Fung ◽  
Mohammad O. Sindi ◽  
Ali H. Dogru

Summary With the advent of the multicore central-processing unit (CPU), today's commodity PC clusters are effectively a collection of interconnected parallel computers, each with multiple multicore CPUs and large shared random access memory (RAM), connected together by means of high-speed networks. Each computer, referred to as a compute node, is a powerful parallel computer on its own. Each compute node can be equipped further with acceleration devices such as the general-purpose graphical processing unit (GPGPU) to further speed up computational-intensive portions of the simulator. Reservoir-simulation methods that can exploit this heterogeneous hardware system can be used to solve very-large-scale reservoir-simulation models and run significantly faster than conventional simulators. Because typical PC clusters are essentially distributed share-memory computers, this suggests that the use of the mixed-paradigm parallelism (distributed-shared memory), such as message-passing interface and open multiprocessing (MPI-OMP), should work well for computational efficiency and memory use. In this work, we compare and contrast the single-paradigm programming models, MPI or OMP, with the mixed paradigm, MPI-OMP, programming model for a class of solver method that is suited for the different modes of parallelism. The results showed that the distributed memory (MPI-only) model has superior multicompute-node scalability, whereas the shared memory (OMP-only) model has superior parallel performance on a single compute node. The mixed MPI-OMP model and OMP-only model are more memory-efficient for the multicore architecture than the MPI-only model because they require less or no halo-cell storage for the subdomains. To exploit the fine-grain shared memory parallelism available on the GPGPU architecture, algorithms should be suited to the single-instruction multiple-data (SIMD) parallelism, and any recursive operations are serialized. In addition, solver methods and data store need to be reworked to coalesce memory access and to avoid shared memory-bank conflicts. Wherever possible, the cost of data transfer through the peripheral component interconnect express (PCIe) bus between the CPU and GPGPU needs to be hidden by means of asynchronous communication. We applied multiparadigm parallelism to accelerate compositional reservoir simulation on a GPGPU-equipped PC cluster. On a dual-CPU-dual-GPGPU compute node, the parallelized solver running on the dual-GPGPU Fermi M2090Q achieved up to 19 times speedup over the serial CPU (1-core) results and up to 3.7 times speedup over the parallel dual-CPU X5675 results in a mixed MPI + OMP paradigm for a 1.728-million-cell compositional model. Parallel performance shows a strong dependency on the subdomain sizes. Parallel CPU solve has a higher performance for smaller domain partitions, whereas GPGPU solve requires large partitions for each chip for good parallel performance. This is related to improved cache efficiency on the CPU for small subdomains and the loading requirement for massive parallelism on the GPGPU. Therefore, for a given model, the multinode parallel performance decreases for the GPGPU relative to the CPU as the model is further subdivided into smaller subdomains to be solved on more compute nodes. To illustrate this, a modified SPE5 (Killough and Kossack 1987) model with various grid dimensions was run to generate comparative results. Parallel performances for three field compositional models of various sizes and dimensions are included to further elucidate and contrast CPU-GPGPU single-node and multiple-node performances. A PC cluster with the Tesla M2070Q GPGPU and the 6-core Xeon X5675 Westmere was used to produce the majority of the reported results. Another PC cluster with the Tesla M2090Q GPGPU was available for some cases, and the results are reported for the modified SPE5 (Killough and Kossack 1987) problems for comparison.


Author(s):  
A. V. Nikitina ◽  
A. E. Chistyakov ◽  
A. M. Atayan

The purpose of this work is to create a software package for a distributed solution of the problem of transporting a pollutant in a reservoir with complex bathymetry and the presence of technological structures. An algorithm has been developed for the parallel solution of the problem of transporting a pollutant (pollutant) in a reservoir on a graphics accelerator controlled by the CUDA (Compute Unified Device Architecture) system; a comparative analysis of the operation of algorithms on a CPU (Central Processing Unit) and on a graphics accelerator GPU (Graphics Processing Unit) made it possible to evaluate their performance. The software implementation of the modules included in the complex is described, the main classes and implemented methods are documented. The results of numerical experiments showed that solving of pollutant transport’s problem based on the CUDA technology is ineffective for small grids (up to 100 ´ 100 computational nodes). In the case of large grids (1000 ´ 1000 computational nodes), the use of CUDA technology reduces the computation time by an order of magnitude. An analysis of the experiments carried out with the developed components of software showed that the maximum value of the ratio of the algorithm operating time that implements the set task of transferring matter in a shallow water on a GPU to the operating time of a similar algorithm on the CPU was 24.92 times, which is achieved on a grid of 1000 ´ 1000 computational nodes. Implementation of methods for decomposition of grid regions is proposed for solving computationally laborious problems of diffusion-convection, including the problem of transporting pollutants in a reservoir with complex bathymetry with technological objects that take into account the architecture and parameters of a MSC (Multiprocessor Computing System) located on the basis of the infrastructure facility of the STU (Scientific and Technological University) “Sirius” (Sochi, Russia). Consideration was made for such a property of a computing system as the time it takes to transmit and receive floating point data. An algorithm for the parallel solution of the task under the control of MPI (Message Passing Interface) technology has been developed, and its efficiency has been assessed. The acceleration values of the proposed algorithm are obtained depending on the number of involved computers (processors) and the size of the computational grid. The maximum number of computers used is 24, the maximum size of the computational grid was 10 000 ´ 10 000 computational nodes. The developed algorithm showed low efficiency for small computational grids (up to 100 ´ 100 computational nodes). In the case of large computational grids ( from 1000  1000 computational nodes), the use of MPI reduces the computation time by several times.


2020 ◽  
Author(s):  
Roudati jannah

Perangkat keras komputer adalah bagian dari sistem komputer sebagai perangkat yang dapat diraba, dilihat secara fisik, dan bertindak untuk menjalankan instruksi dari perangkat lunak (software). Perangkat keras komputer juga disebut dengan hardware. Hardware berperan secara menyeluruh terhadap kinerja suatu sistem komputer. Prinsipnya sistem komputer selalu memiliki perangkat keras masukan (input/input device system) – perangkat keras premprosesan (processing/central processing unit) – perangkat keras luaran (output/output device system) – perangkat tambahan yang sifatnya opsional (peripheral) dan tempat penyimpanan data (storage device system/external memory).


2020 ◽  
Author(s):  
Ika Milia wahyunu Siregar

Perkembangan IT di dunia sangat pesat, mulai dari perkembangan sofware hingga hardware. Teknologi sekarang telah mendominasi sebagian besar di permukaan bumi ini. Karena semakin cepatnya perkembangan Teknologi, kita sebagai pengguna bisa ketinggalan informasi mengenai teknologi baru apabila kita tidak up to date dalam pengetahuan teknologi ini. Hal itu dapat membuat kita mudah tergiur dan tertipu dengan berbagai iklan teknologi tanpa memikirkan sisi negatifnya. Sebagai pengguna dari komputer, kita sebaiknya tahu seputar mengenai komponen-komponen komputer. Komputer adalah serangkaian mesin elektronik yang terdiri dari jutaan komponen yang dapat saling bekerja sama, serta membentuk sebuah sistem kerja yang rapi dan teliti. Sistem ini kemudian digunakan untuk dapat melaksanakan pekerjaan secara otomatis, berdasarkan instruksi (program) yang diberikan kepadanya. Istilah Hardware komputer atau perangkat keras komputer, merupakan benda yang secara fisik dapat dipegang, dipindahkan dan dilihat. Central Processing System/ Central Processing Unit (CPU) adalah salah satu jenis perangkat keras yang berfungsi sebagai tempat untuk pengolahan data atau juga dapat dikatakan sebagai otak dari segala aktivitas pengolahan seperti penghitungan, pengurutan, pencarian, penulisan, pembacaan dan sebagainya.


2020 ◽  
Author(s):  
Intan khadijah simatupang

Komputer adalah serangkaian mesin elektronik yang terdiri dari jutaan komponen yang dapat saling bekerja sama, serta membentuk sebuah sistem kerja yang rapi dan teliti. Sistem ini kemudian digunakan untuk dapat melaksanakan pekerjaan secara otomatis, berdasarkan instruksi (program) yang diberikan kepadanya. Istilah Hardware computer atau perangkat keras komputer, merupakan benda yang secara fisik dapat dipegang, dipindahkan dan dilihat. Software komputer atau perangkat lunak komputer merupakan kumpulan instruksi (program/prosedur) untuk dapat melaksanakan pekerjaan secara otomatis dengan cara mengolah atau memproses kumpulan instruksi (data) yang diberikan. Pada prinsipnya sistem komputer selalu memiliki perangkat keras masukan (input/input device system) – perangkat keras pemprosesan (processing/ central processing unit) – perangkat keras keluaran (output/output device system), perangkat tambahan yang sifatnya opsional (peripheral) dan tempat penyimpanan data (Storage device system/external memory).


2020 ◽  
Author(s):  
Siti Kumala Dewi

Perangkat keras komputer adalah bagian dari sistem komputer sebagai perangkat yang dapat diraba, dilihat secara fisik, dan bertindak untuk menjalankan instruksi dari perangkat lunak (software). Perangkat keras komputer juga disebut dengan hardware. Hardware berperan secara menyeluruh terhadap kinerja suatu sistem komputer. Berdasarkan fungsinya, perangkat keras terbagi menjadi :1.Sistem Perangkat Keras Masukan (Input Device System )2.Sistem Pemrosesan ( Central Processing System/ Central Processing Unit(CPU)3.Sistem Perangkat Keras Keluaran ( Output Device System )4.Sistem Perangkat Keras Tambahan (Peripheral/Accessories Device System)


Sign in / Sign up

Export Citation Format

Share Document