Parallel Implementation of Non-slicing Floorplans with MPI and OpenMP

10.32920/ryerson.14647368.v1 ◽

2021 ◽

Author(s):

Oluvaseun Owojaiye

Keyword(s):

Message Passing ◽

Large Scale ◽

Message Passing Interface ◽

Parallel Implementation ◽

Computation Time ◽

Sequential Algorithm ◽

Design Stage ◽

Single Chip ◽

Solution Quality ◽

Early Design Stage

Advancement in technology has brought considerable improvement to processor design and now manufacturers design multiple processors on a single chip. Supercomputers today consists of cluster of interconnected nodes that collaborate together to solve complex and advanced computation problems. Message Passing Interface and Open Multiprocessing are the popularly used programming models to optimize sequential codes by parallelizing them on the different multiprocessor architecture that exist today. In this thesis, we parallelize the non-slicing floorplan algorithm based on Multilevel Floorplanning/placement of large scale modules using B*tree (MB*tree) with MPI and OpenMP on distributed and shared memory architectures respectively. In VLSI (Very Large Scale Integration) design automation, floorplanning is an initial and vital task performed in the early design stage. Experimental results using MCNC benchmark circuits show that our parallel algorithm produced better results than the corresponding sequential algorithm; we were able to speed up the algorithm up to 4 times, hence reducing computation time and maintaining floorplan solution quality. On the other hand, we compared both parallel versions; and the OpenMP results gave slightly better than the corresponding MPI results.

Download Full-text

Comparative Study between Parallel K-Means and Parallel K-Medoids with Message Passing Interface (MPI)

International Journal on Information and Communication Technology (IJoICT) ◽

10.21108/ijoict.2016.22.86 ◽

2017 ◽

Vol 2 (2) ◽

pp. 27

Author(s):

Fhira Nhita

Keyword(s):

Data Mining ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Clustering Algorithms ◽

Computation Time ◽

Sequential Algorithm ◽

Data Mining Technique ◽

Combination Technology ◽

Performance Computing

<p>Data mining is a combination technology for analyze a useful information from dataset using some technique such as classification, clustering, and etc. Clustering is one of the most used data mining technique these day. K-Means and K-Medoids is one of clustering algorithms that mostly used because it’s easy implementation, efficient, and also present good results. Besides mining important information, the needs of time spent when mining data is also a concern in today era considering the real world applications produce huge volume of data. This research analyzed the result from K-Means and K-Medoids algorithm and time performance using High Performance Computing (HPC) Cluster to parallelize K-Means and K-Medoids algorithms and using Message Passing Interface (MPI) library. The results shown that K-Means algorithm gives smaller SSE than K-Medoids. And also parallel algorithm that used MPI gives faster computation time than sequential algorithm.</p>

Download Full-text

Parallel workflow manager for non-parallel bioinformatic applications to solve large-scale biological problems on a supercomputer

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720016410080 ◽

2016 ◽

Vol 14 (02) ◽

pp. 1641008 ◽

Cited By ~ 8

Author(s):

Dmitry Suplatov ◽

Nina Popova ◽

Sergey Zhumatiy ◽

Vladimir Voevodin ◽

Vytas Švedas

Keyword(s):

Message Passing ◽

Large Scale ◽

Message Passing Interface ◽

Parallel Implementation ◽

Biological Data ◽

Rapid Expansion ◽

Biological Macromolecules ◽

Processing Unit ◽

Common Input ◽

Systematic Analysis

Rapid expansion of online resources providing access to genomic, structural, and functional information associated with biological macromolecules opens an opportunity to gain a deeper understanding of the mechanisms of biological processes due to systematic analysis of large datasets. This, however, requires novel strategies to optimally utilize computer processing power. Some methods in bioinformatics and molecular modeling require extensive computational resources. Other algorithms have fast implementations which take at most several hours to analyze a common input on a modern desktop station, however, due to multiple invocations for a large number of subtasks the full task requires a significant computing power. Therefore, an efficient computational solution to large-scale biological problems requires both a wise parallel implementation of resource-hungry methods as well as a smart workflow to manage multiple invocations of relatively fast algorithms. In this work, a new computer software mpiWrapper has been developed to accommodate non-parallel implementations of scientific algorithms within the parallel supercomputing environment. The Message Passing Interface has been implemented to exchange information between nodes. Two specialized threads — one for task management and communication, and another for subtask execution — are invoked on each processing unit to avoid deadlock while using blocking calls to MPI. The mpiWrapper can be used to launch all conventional Linux applications without the need to modify their original source codes and supports resubmission of subtasks on node failure. We show that this approach can be used to process huge amounts of biological data efficiently by running non-parallel programs in parallel mode on a supercomputer. The C++ source code and documentation are available from http://biokinet.belozersky.msu.ru/mpiWrapper .

Download Full-text

TRACING THE EMERGENCE OF DESIGN PROBLEMS AND THEIR IMPACTS ON THE COMPLEXITY OF ENGINEERING SOLUTIONS

Proceedings of the Design Society ◽

10.1017/pds.2021.584 ◽

2021 ◽

Vol 1 ◽

pp. 3229-3238

Author(s):

Torben Beernaert ◽

Pascal Etman ◽

Maarten De Bock ◽

Ivo Classen ◽

Marco De Baar

Keyword(s):

Large Scale ◽

Fusion Reactor ◽

Design Stage ◽

Engineering Model ◽

Design Problems ◽

Nuclear Fusion Reactor ◽

Early Design Stage ◽

Iterative Design ◽

Problems And Solutions ◽

Emergent Design

AbstractThe design of ITER, a large-scale nuclear fusion reactor, is intertwined with profound research and development efforts. Tough problems call for novel solutions, but the low maturity of those solutions can lead to unexpected problems. If designers keep solving such emergent problems in iterative design cycles, the complexity of the resulting design is bound to increase. Instead, we want to show designers the sources of emergent design problems, so they may be dealt with more effectively. We propose to model the interplay between multiple problems and solutions in a problem network. Each problem and solution is then connected to a dynamically changing engineering model, a graph of physical components. By analysing the problem network and the engineering model, we can (1) derive which problem has emerged from which solution and (2) compute the contribution of each design effort to the complexity of the evolving engineering model. The method is demonstrated for a sequence of problems and solutions that characterized the early design stage of an optical subsystem of ITER.

Download Full-text

Parallel implementation for HSLO(3)-FDTD with message passing interface on Distributed Memory Architecture

2006 International Conference on Computing & Informatics ◽

10.1109/icoci.2006.5276531 ◽

2006 ◽

Author(s):

Mohammad Khatim Hasan ◽

Mohamed Othman ◽

Jalil Md Desa ◽

Zulkifly Abbas ◽

Jumat Sulaiman

Keyword(s):

Message Passing ◽

Message Passing Interface ◽

Distributed Memory ◽

Parallel Implementation ◽

Memory Architecture ◽

Distributed Memory Architecture

Download Full-text

A lightweight approach to performance portability with targetDP

The International Journal of High Performance Computing Applications ◽

10.1177/1094342016682071 ◽

2016 ◽

Vol 32 (2) ◽

pp. 288-301

Author(s):

Alan Gray ◽

Kevin Stratford

Keyword(s):

Particle Physics ◽

Message Passing ◽

Graphics Processing Units ◽

High Performance ◽

Large Scale ◽

Message Passing Interface ◽

Graphics Processing Unit ◽

Processing Unit ◽

Performance Portability ◽

Graphics Processing

Leading high performance computing systems achieve their status through use of highly parallel devices such as NVIDIA graphics processing units or Intel Xeon Phi many-core CPUs. The concept of performance portability across such architectures, as well as traditional CPUs, is vital for the application programmer. In this paper we describe targetDP, a lightweight abstraction layer which allows grid-based applications to target data parallel hardware in a platform agnostic manner. We demonstrate the effectiveness of our pragmatic approach by presenting performance results for a complex fluid application (with which the model was co-designed), plus separate lattice quantum chromodynamics particle physics code. For each application, a single source code base is seen to achieve portable performance, as assessed within the context of the Roofline model. TargetDP can be combined with Message Passing Interface (MPI) to allow use on systems containing multiple nodes: we demonstrate this through provision of scaling results on traditional and graphics processing unit-accelerated large scale supercomputers.

Download Full-text

Space-time clustering-based method to optimize shareability in real-time ride-sharing

PLoS ONE ◽

10.1371/journal.pone.0262499 ◽

2022 ◽

Vol 17 (1) ◽

pp. e0262499

Author(s):

Negin Alisoltani ◽

Mostafa Ameli ◽

Mahdi Zargayouna ◽

Ludovic Leclercq

Keyword(s):

Real Time ◽

Large Scale ◽

Computation Time ◽

Clustering Method ◽

Matching Problem ◽

Solution Quality ◽

Mobility Service ◽

Ride Sharing ◽

Large Scale Problems ◽

Spatio Temporal

Real-time ride-sharing has become popular in recent years. However, the underlying optimization problem for this service is highly complex. One of the most critical challenges when solving the problem is solution quality and computation time, especially in large-scale problems where the number of received requests is huge. In this paper, we rely on an exact solving method to ensure the quality of the solution, while using AI-based techniques to limit the number of requests that we feed to the solver. More precisely, we propose a clustering method based on a new shareability function to put the most shareable trips inside separate clusters. Previous studies only consider Spatio-temporal dependencies to do clustering on the mobility service requests, which is not efficient in finding the shareable trips. Here, we define the shareability function to consider all the different sharing states for each pair of trips. Each cluster is then managed with a proposed heuristic framework in order to solve the matching problem inside each cluster. As the method favors sharing, we present the number of sharing constraints to allow the service to choose the number of shared trips. To validate our proposal, we employ the proposed method on the network of Lyon city in France, with half-million requests in the morning peak from 6 to 10 AM. The results demonstrate that the algorithm can provide high-quality solutions in a short time for large-scale problems. The proposed clustering method can also be used for different mobility service problems such as car-sharing, bike-sharing, etc.

Download Full-text

Simulation-Based Scheduling of Waterway Projects Using a Parallel Genetic Algorithm

Transportation Systems and Engineering ◽

10.4018/978-1-4666-8473-7.ch016 ◽

2015 ◽

pp. 334-347 ◽

Cited By ~ 2

Author(s):

Ning Yang ◽

Shiaaulir Wang ◽

Paul Schonfeld

Keyword(s):

Genetic Algorithm ◽

Parallel Computing ◽

Message Passing ◽

Message Passing Interface ◽

Computation Time ◽

Parallel Genetic Algorithm ◽

Simulation Based ◽

Multiple Processors ◽

Simulation Based Optimization ◽

Speed Up

A Parallel Genetic Algorithm (PGA) is used for a simulation-based optimization of waterway project schedules. This PGA is designed to distribute a Genetic Algorithm application over multiple processors in order to speed up the solution search procedure for a very large combinational problem. The proposed PGA is based on a global parallel model, which is also called a master-slave model. A Message-Passing Interface (MPI) is used in developing the parallel computing program. A case study is presented, whose results show how the adaption of a simulation-based optimization algorithm to parallel computing can greatly reduce computation time. Additional techniques which are found to further improve the PGA performance include: (1) choosing an appropriate task distribution method, (2) distributing simulation replications instead of different solutions, (3) avoiding the simulation of duplicate solutions, (4) avoiding running multiple simulations simultaneously in shared-memory processors, and (5) avoiding using multiple processors which belong to different clusters (physical sub-networks).

Download Full-text

Parallelization of the Lattice Boltzmann Method in Simulating Buoyancy-Driven Convection Heat Transfer

Heat Transfer, Volume 2 ◽

10.1115/imece2004-61871 ◽

2004 ◽

Author(s):

Anoosheh Niavarani-Kheirier ◽

Masoud Darbandi ◽

Gerry E. Schneider

Keyword(s):

Lattice Boltzmann Method ◽

Lattice Boltzmann ◽

Message Passing ◽

Large Scale ◽

Message Passing Interface ◽

Parallel Machines ◽

Convection Heat Transfer ◽

Wide Range ◽

Buoyancy Driven Convection ◽

Boltzmann Method

The main objective of the current work is to utilize Lattice Boltzmann Method (LBM) for simulating buoyancy-driven flow considering the hybrid thermal lattice Boltzmann equation (HTLBE). After deriving the required formulations, they are validated against a wide range of Rayleigh numbers in buoyancy-driven square cavity problem. The performance of the method is investigated on parallel machines using Message Passing Interface (MPI) library and implementing domain decomposition technique to solve problems with large order of computations. The achieved results show that the code is highly efficient to solve large scale problems with excellent speedup.

Download Full-text

Reducing communication in algebraic multigrid with multi-step node aware communication

The International Journal of High Performance Computing Applications ◽

10.1177/1094342020925535 ◽

2020 ◽

Vol 34 (5) ◽

pp. 547-561

Author(s):

Amanda Bienz ◽

William D Gropp ◽

Luke N Olson

Keyword(s):

Message Passing ◽

Message Passing Interface ◽

Parallel Implementation ◽

Algebraic Multigrid ◽

Sparse Linear Systems ◽

Parallel Scalability ◽

Strong Scaling ◽

The Cost ◽

Communication Schedule ◽

Inter Process Communication

Algebraic multigrid (AMG) is often viewed as a scalable [Formula: see text] solver for sparse linear systems. Yet, AMG lacks parallel scalability due to increasingly large costs associated with communication, both in the initial construction of a multigrid hierarchy and in the iterative solve phase. This work introduces a parallel implementation of AMG that reduces the cost of communication, yielding improved parallel scalability. It is common in Message Passing Interface (MPI), particularly in the MPI-everywhere approach, to arrange inter-process communication, so that communication is transported regardless of the location of the send and receive processes. Performance tests show notable differences in the cost of intra- and internode communication, motivating a restructuring of communication. In this case, the communication schedule takes advantage of the less costly intra-node communication, reducing both the number and the size of internode messages. Node-centric communication extends to the range of components in both the setup and solve phase of AMG, yielding an increase in the weak and strong scaling of the entire method.

Download Full-text