The MPI and OpenMP Implementation of Parallel Algorithm for Generating Mandelbrot Set

The paper introduce the Mandelbrot Set and the message passing interface (MPI) and shared-memory (OpenMP), analyses the characteristic of algorithm design in the MPI and OpenMP environment, describes the implementation of parallel algorithm about Mandelbrot Set in the MPI environment and the OpenMP environment, conducted a series of evaluation and performance testing during the process of running, then the difference between the two system implementations is compared.

Download Full-text

A Hybrid MPI–OpenMP Parallel Algorithm and Performance Analysis for an Ensemble Square Root Filter Designed for Multiscale Observations

Journal of Atmospheric and Oceanic Technology ◽

10.1175/jtech-d-12-00165.1 ◽

2013 ◽

Vol 30 (7) ◽

pp. 1382-1397 ◽

Cited By ~ 14

Author(s):

Yunheng Wang ◽

Youngsun Jung ◽

Timothy A. Supinie ◽

Ming Xue

Keyword(s):

Data Assimilation ◽

Domain Decomposition ◽

Parallel Algorithm ◽

Shared Memory ◽

Message Passing ◽

Message Passing Interface ◽

High Volume ◽

Square Root ◽

Fixed Amount ◽

Square Root Filter

Abstract A hybrid parallel scheme for the ensemble square root filter (EnSRF) suitable for parallel assimilation of multiscale observations, including those from dense observational networks such as those of radar, is developed based on the domain decomposition strategy. The scheme handles internode communication through a message passing interface (MPI) and the communication within shared-memory nodes via Open Multiprocessing (OpenMP) threads. It also supports pure MPI and pure OpenMP modes. The parallel framework can accommodate high-volume remote-sensed radar (or satellite) observations as well as conventional observations that usually have larger covariance localization radii. The performance of the parallel algorithm has been tested with simulated and real radar data. The parallel program shows good scalability in pure MPI and hybrid MPI–OpenMP modes, while pure OpenMP runs exhibit limited scalability on a symmetric shared-memory system. It is found that in MPI mode, better parallel performance is achieved with domain decomposition configurations in which the leading dimension of the state variable arrays is larger, because this configuration allows for more efficient memory access. Given a fixed amount of computing resources, the hybrid parallel mode is preferred to pure MPI mode on supercomputers with nodes containing shared-memory cores. The overall performance is also affected by factors such as the cache size, memory bandwidth, and the networking topology. Tests with a real data case with a large number of radars confirm that the parallel data assimilation can be done on a multicore supercomputer with a significant speedup compared to the serial data assimilation algorithm.

Download Full-text

Shared Memory Transport for ALFA

EPJ Web of Conferences ◽

10.1051/epjconf/201921405029 ◽

2019 ◽

Vol 214 ◽

pp. 05029 ◽

Cited By ~ 2

Author(s):

Alexey Rybalchenko ◽

Dennis Klein ◽

Mohammad Al-Turany ◽

Thorsten Kollegger

Keyword(s):

Shared Memory ◽

Particle Physics ◽

Message Passing ◽

Message Passing Interface ◽

Large Data ◽

Building Blocks ◽

Data Transport ◽

High Data ◽

And Performance ◽

Physics Experiments

The high data rates expected for the next generation of particle physics experiments (e.g.: new experiments at FAIR/GSI and the upgrade of CERN experiments) call for dedicated attention with respect to design of the needed computing infrastructure. The common ALICE-FAIR framework ALFA is a modern software layer, that serves as a platform for simulation, reconstruction and analysis of particle physics experiments. Beside standard services needed for simulation and reconstruction of particle physics experiments, ALFA also provides tools for data transport, configuration and deployment. The FairMQ module in ALFA offers building blocks for creating distributed software components (processes) that communicate between each other via message passing. The abstract "message passing" interface in FairMQ has at the moment three implementations: ZeroMQ, nanomsg and shared memory. The newly developed shared memory transport will be presented, that provides significant per-formance benefits for transferring large data chunks between components on the same node. The implementation in FairMQ allows users to switch between the different transports via a trivial configuration change. The design decisions, im-plementation details and performance numbers of the shared memory transport in FairMQ/ALFA will be highlighted.

Download Full-text

Parallel Algorithm Design and Performance Evaluation of FDTD on 3 Different Architectures: Cluster, Homogeneous Multicore and Cell/B.E.

2008 10th IEEE International Conference on High Performance Computing and Communications ◽

10.1109/hpcc.2008.85 ◽

2008 ◽

Cited By ~ 2

Author(s):

Meilian Xu ◽

Parimala Thulasiraman

Keyword(s):

Performance Evaluation ◽

Parallel Algorithm ◽

Algorithm Design ◽

And Performance ◽

Parallel Algorithm Design

Download Full-text

A parallelization scheme to simulate reactive transport in the subsurface environment with OGS#IPhreeqc

Geoscientific Model Development Discussions ◽

10.5194/gmdd-8-2369-2015 ◽

2015 ◽

Vol 8 (3) ◽

pp. 2369-2402

Author(s):

W. He ◽

C. Beyer ◽

J. H. Fleckenstein ◽

E. Jang ◽

O. Kolditz ◽

...

Keyword(s):

Message Passing ◽

Reactive Transport ◽

Message Passing Interface ◽

Transport Processes ◽

Coupled Processes ◽

Scientific Software ◽

Geochemical Reactions ◽

Optimized Allocation ◽

And Performance ◽

The One

Abstract. This technical paper presents an efficient and performance-oriented method to model reactive mass transport processes in environmental and geotechnical subsurface systems. The open source scientific software packages OpenGeoSys and IPhreeqc have been coupled, to combine their individual strengths and features to simulate thermo-hydro-mechanical-chemical coupled processes in porous and fractured media with simultaneous consideration of aqueous geochemical reactions. Furthermore, a flexible parallelization scheme using MPI (Message Passing Interface) grouping techniques has been implemented, which allows an optimized allocation of computer resources for the node-wise calculation of chemical reactions on the one hand, and the underlying processes such as for groundwater flow or solute transport on the other hand. The coupling interface and parallelization scheme have been tested and verified in terms of precision and performance.

Download Full-text

A Parallel Algorithm of String Matching Based on Message Passing Interface for Multicore Processors

International Journal of Hybrid Information Technology ◽

10.14257/ijhit.2016.9.3.04 ◽

2016 ◽

Vol 9 (3) ◽

pp. 31-38 ◽

Cited By ~ 3

Author(s):

Jiaxing Qu ◽

Guoyin Zhang ◽

Zhou Fang ◽

Jiahui Liu

Keyword(s):

Parallel Algorithm ◽

Message Passing ◽

Message Passing Interface ◽

Multicore Processors ◽

String Matching

Download Full-text

A Parallel Encryption Algorithm Based on Piecewise Linear Chaotic Map

Mathematical Problems in Engineering ◽

10.1155/2013/537934 ◽

2013 ◽

Vol 2013 ◽

pp. 1-7 ◽

Cited By ~ 3

Author(s):

Xizhong Wang ◽

Deyun Chen

Keyword(s):

Parallel Algorithm ◽

Message Passing ◽

Message Passing Interface ◽

Multicore Processors ◽

Piecewise Linear ◽

Chaotic Map ◽

Encryption Algorithm ◽

Communication Model ◽

Piecewise Linear Chaotic Map ◽

Encryption Decryption

We introduce a parallel chaos-based encryption algorithm for taking advantage of multicore processors. The chaotic cryptosystem is generated by the piecewise linear chaotic map (PWLCM). The parallel algorithm is designed with a master/slave communication model with the Message Passing Interface (MPI). The algorithm is suitable not only for multicore processors but also for the single-processor architecture. The experimental results show that the chaos-based cryptosystem possesses good statistical properties. The parallel algorithm provides much better performance than the serial ones and would be useful to apply in encryption/decryption file with large size or multimedia.

Download Full-text

MPI to Coarray Fortran: Experiences with a CFD Solver for Unstructured Meshes

Scientific Programming ◽

10.1155/2017/3409647 ◽

2017 ◽

Vol 2017 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Anuj Sharma ◽

Irene Moulitsas

Keyword(s):

High Resolution ◽

Message Passing ◽

Message Passing Interface ◽

Parallel Implementation ◽

Unstructured Meshes ◽

Navier Stokes ◽

Performance Measurements ◽

Partitioned Global Address Space ◽

Computational Fluid Dynamics Cfd ◽

And Performance

High-resolution numerical methods and unstructured meshes are required in many applications of Computational Fluid Dynamics (CFD). These methods are quite computationally expensive and hence benefit from being parallelized. Message Passing Interface (MPI) has been utilized traditionally as a parallelization strategy. However, the inherent complexity of MPI contributes further to the existing complexity of the CFD scientific codes. The Partitioned Global Address Space (PGAS) parallelization paradigm was introduced in an attempt to improve the clarity of the parallel implementation. We present our experiences of converting an unstructured high-resolution compressible Navier-Stokes CFD solver from MPI to PGAS Coarray Fortran. We present the challenges, methodology, and performance measurements of our approach using Coarray Fortran. With the Cray compiler, we observe Coarray Fortran as a viable alternative to MPI. We are hopeful that Intel and open-source implementations could be utilized in the future.

Download Full-text

VISUAL PROGRAMMING FOR MESSAGE-PASSING SYSTEMS

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194099000231 ◽

1999 ◽

Vol 09 (04) ◽

pp. 397-423 ◽

Cited By ~ 8

Author(s):

NENAD STANKOVIC ◽

KANG ZHANG

Keyword(s):

Message Passing ◽

Message Passing Interface ◽

Direct Interaction ◽

Visual Programming ◽

Levels Of Abstraction ◽

Concrete Objects ◽

Real Objects ◽

And Performance ◽

Flow Graphs ◽

Parallel Debugging

The attractiveness of visual programming stems in large part from the direct interaction with program elements as if they were real objects, since people deal better with concrete objects than with the abstract. This paper describes a new graph based software visualization tool for parallel message-passing programming named Visper that combines the levels of abstraction at which message-passing parallel programs are expressed and makes use of compositional programming. Central to the tool is the Process Communication Graph that correlates both the control and data flow graphs into a single graph formalism, without a need for complex textual annotation. The graph can express static and runtime communication and replication structures, as found in Message Passing Interface (MPI) and Parallel Virtual Machine (PVM). It also forms the basis for visualizing parallel debugging and performance.

Download Full-text

Reverse Auction-Based Services Optimization in Cloud Computing Environments

Security and Communication Networks ◽

10.1155/2021/6666628 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Hongkun Zhang ◽

Xinmin Liu

Keyword(s):

Cloud Computing ◽

Competitive Ratio ◽

Online Algorithm ◽

Algorithm Design ◽

Reverse Auction ◽

User Agent ◽

The Difference ◽

And Performance ◽

The Right ◽

Cloud User

Cloud-based services have been increasingly used to provide on-demand access to a large amount of computing requests, such as data, computing, resources, and so on, in which it is vitally important to correctly select and assign the right resources to a workload or application. This paper presents a novel online reverse auction scheme based on online algorithm for allocating the cloud computing services, which can help the cloud users and providers to build workflow applications in a cloud computing environment. The online reverse auction scheme consists of three parts: online algorithm design, competitive ratio calculation, and performance valuation. The online reverse auction-based algorithm is proposed for the cloud user agent to choose the final winners based on Vickrey–Clarke–Groves (VCG) mechanism and online algorithm (OA). The competitive analysis is applied to calculate the competitive ratio of the proposed algorithm compared with the offline algorithm. This analysis method is significant to measure the performance of proposed algorithm, without the assumption of the distribution of cloud providers’ bids. The results prove that the proposed online reverse auction-based algorithm is the appropriate mechanism because it allows the cloud user agent to make purchase decisions without knowing the future bids. The difference of auction rounds and transaction cost can impressively influence and improve the performance of the proposed reverse auction algorithm.

Download Full-text

Implementation and Performance of DSMPI

Scientific Programming ◽

10.1155/1997/452521 ◽

1997 ◽

Vol 6 (2) ◽

pp. 201-214 ◽

Cited By ~ 2

Author(s):

Luis M. Silva ◽

JoÃo Gabriel Silva ◽

Simon Chapple

Keyword(s):

Shared Memory ◽

Message Passing ◽

Distributed Memory ◽

Programming Model ◽

Distributed Shared Memory ◽

Memory Systems ◽

Distributed Memory Machines ◽

Coherence Protocols ◽

And Performance ◽

Performance Results

Distributed shared memory has been recognized as an alternative programming model to exploit the parallelism in distributed memory systems because it provides a higher level of abstraction than simple message passing. DSM combines the simple programming model of shared memory with the scalability of distributed memory machines. This article presents DSMPI, a parallel library that runs atop of MPI and provides a DSM abstraction. It provides an easy-to-use programming interface, is fully, portable, and supports heterogeneity. For the sake of flexibility, it supports different coherence protocols and models of consistency. We present some performance results taken in a network of workstations and in a Cray T3D which show that DSMPI can be competitive with MPI for some applications.

Download Full-text