SPATIOTEMPORAL DOMAIN DECOMPOSITION FOR MASSIVE PARALLEL COMPUTATION OF SPACE-TIME KERNEL DENSITY

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsannals-ii-4-w2-7-2015 ◽

2015 ◽

Vol II-4/W2 ◽

pp. 7-11 ◽

Cited By ~ 7

Author(s):

A. Hohl ◽

E. M. Delmelle ◽

W. Tang

Keyword(s):

Domain Decomposition ◽

Parallel Computation ◽

High Performance ◽

Parallel Implementation ◽

Kernel Density ◽

Great Accuracy ◽

Space Time ◽

Size Diversity ◽

Data Points ◽

Tropical Climates

Accelerated processing capabilities are deemed critical when conducting analysis on spatiotemporal datasets of increasing size, diversity and availability. High-performance parallel computing offers the capacity to solve computationally demanding problems in a limited timeframe, but likewise poses the challenge of preventing processing inefficiency due to workload imbalance between computing resources. Therefore, when designing new algorithms capable of implementing parallel strategies, careful spatiotemporal domain decomposition is necessary to account for heterogeneity in the data. In this study, we perform octtree-based adaptive decomposition of the spatiotemporal domain for parallel computation of space-time kernel density. In order to avoid edge effects near subdomain boundaries, we establish spatiotemporal buffers to include adjacent data-points that are within the spatial and temporal kernel bandwidths. Then, we quantify computational intensity of each subdomain to balance workloads among processors. We illustrate the benefits of our methodology using a space-time epidemiological dataset of Dengue fever, an infectious vector-borne disease that poses a severe threat to communities in tropical climates. Our parallel implementation of kernel density reaches substantial speedup compared to sequential processing, and achieves high levels of workload balance among processors due to great accuracy in quantifying computational intensity. Our approach is portable of other space-time analytical tests.

Download Full-text

Parallel finite-element method using domain decomposition and Parareal for transient motor starting analysis

COMPEL The International Journal for Computation and Mathematics in Electrical and Electronic Engineering ◽

10.1108/compel-12-2018-0516 ◽

2019 ◽

Vol 38 (5) ◽

pp. 1507-1520 ◽

Cited By ~ 1

Author(s):

Yasuhito Takahashi ◽

Koji Fujiwara ◽

Takeshi Iwashita ◽

Hiroshi Nakashima

Keyword(s):

Finite Element Method ◽

Finite Element ◽

Parallel Computing ◽

Domain Decomposition ◽

Parallel Computation ◽

Domain Decomposition Method ◽

Space Time ◽

Content Type ◽

Parallel Performance ◽

Element Method

Purpose This paper aims to propose a parallel-in-space-time finite-element method (FEM) for transient motor starting analyses. Although the domain decomposition method (DDM) is suitable for solving large-scale problems and the parallel-in-time (PinT) integration method such as Parareal and time domain parallel FEM (TDPFEM) is effective for problems with a large number of time steps, their parallel performances get saturated as the number of processes increases. To overcome the difficulty, the hybrid approach in which both the DDM and PinT integration methods are used is investigated in a highly parallel computing environment. Design/methodology/approach First, the parallel performances of the DDM, Parareal and TDPFEM were compared because the scalability of these methods in highly parallel computation has not been deeply discussed. Then, the combination of the DDM and Parareal was investigated as a parallel-in-space-time FEM. The effectiveness of the developed method was demonstrated in transient starting analyses of induction motors. Findings The combination of Parareal with the DDM can improve the parallel performance in the case where the parallel performance of the DDM, TDPFEM or Parareal is saturated in highly parallel computation. In the case where the number of unknowns is large and the number of available processes is limited, the use of DDM is the most effective from the standpoint of computational cost. Originality/value This paper newly develops the parallel-in-space-time FEM and demonstrates its effectiveness in nonlinear magnetoquasistatic field analyses of electric machines. This finding is significantly important because a new direction of parallel computing techniques and great potential for its further development are clarified.

Download Full-text

A Parallel Implementation of Unscheduled Flow Control in Interconnected Power Systems

Mathematical Problems in Engineering ◽

10.1155/2012/376291 ◽

2012 ◽

Vol 2012 ◽

pp. 1-19

Author(s):

G. Ozdemir Dag ◽

Mustafa Bagriyanik

Keyword(s):

Power Systems ◽

Power System ◽

Parallel Computation ◽

High Performance ◽

Power Flow ◽

Parallel Implementation ◽

Computation Time ◽

Test System ◽

Fuzzy Decision Making ◽

Optimization Approach

The unscheduled power flow problem needs to be minimized or controlled as soon as possible in a deregulated power system since the transmission systems are mostly operated at their power-carrying limits or very close to it. The time spent for simulations to determine the current states of all the system and control variables of the interconnected power system is important. Taking necessary action in case of any failure of equipment or any other occurrence of an undesired situation could be critical. Using supercomputing facilities and parallel computing techniques together decreases the computation time greatly. In this study, a parallel implementation of a multiobjective optimization approach based on both genetic algorithms and fuzzy decision making to manage unscheduled flows is presented. Parallel computation techniques are applied using supercomputers (high-performance computers). The proposed method is applied to the IEEE 300 bus test system. Two different cases for some parameters of GA are considered to see the power of parallel computation technique. Then the simulation results are presented.

Download Full-text

High-Level Parallel Ant Colony Optimization with Algorithmic Skeletons

International Journal of Parallel Programming ◽

10.1007/s10766-021-00714-1 ◽

2021 ◽

Author(s):

Breno A. de Melo Menezes ◽

Nina Herrmann ◽

Herbert Kuchen ◽

Fernando Buarque de Lima Neto

Keyword(s):

Ant Colony Optimization ◽

High Performance ◽

Optimization Problems ◽

Programming Model ◽

Parallel Implementation ◽

Ant Colony ◽

Algorithmic Skeletons ◽

Low Level ◽

Programming Patterns ◽

High Level

AbstractParallel implementations of swarm intelligence algorithms such as the ant colony optimization (ACO) have been widely used to shorten the execution time when solving complex optimization problems. When aiming for a GPU environment, developing efficient parallel versions of such algorithms using CUDA can be a difficult and error-prone task even for experienced programmers. To overcome this issue, the parallel programming model of Algorithmic Skeletons simplifies parallel programs by abstracting from low-level features. This is realized by defining common programming patterns (e.g. map, fold and zip) that later on will be converted to efficient parallel code. In this paper, we show how algorithmic skeletons formulated in the domain specific language Musket can cope with the development of a parallel implementation of ACO and how that compares to a low-level implementation. Our experimental results show that Musket suits the development of ACO. Besides making it easier for the programmer to deal with the parallelization aspects, Musket generates high performance code with similar execution times when compared to low-level implementations.

Download Full-text

A Modified KNN Algorithm for High-Performance Computing on FPGA of Real-Time m-QAM Demodulators

Electronics ◽

10.3390/electronics10050627 ◽

2021 ◽

Vol 10 (5) ◽

pp. 627

Author(s):

David Marquez-Viloria ◽

Luis Castano-Londono ◽

Neil Guerrero-Gonzalez

Keyword(s):

Real Time ◽

High Performance ◽

Interference Mitigation ◽

Parallel Implementation ◽

Computational Time ◽

Successful Implementation ◽

Interchannel Interference ◽

The Difference ◽

High Level ◽

Performance Computing

A methodology for scalable and concurrent real-time implementation of highly recurrent algorithms is presented and experimentally validated using the AWS-FPGA. This paper presents a parallel implementation of a KNN algorithm focused on the m-QAM demodulators using high-level synthesis for fast prototyping, parameterization, and scalability of the design. The proposed design shows the successful implementation of the KNN algorithm for interchannel interference mitigation in a 3 × 16 Gbaud 16-QAM Nyquist WDM system. Additionally, we present a modified version of the KNN algorithm in which comparisons among data symbols are reduced by identifying the closest neighbor using the rule of the 8-connected clusters used for image processing. Real-time implementation of the modified KNN on a Xilinx Virtex UltraScale+ VU9P AWS-FPGA board was compared with the results obtained in previous work using the same data from the same experimental setup but offline DSP using Matlab. The results show that the difference is negligible below FEC limit. Additionally, the modified KNN shows a reduction of operations from 43 percent to 75 percent, depending on the symbol’s position in the constellation, achieving a reduction 47.25% reduction in total computational time for 100 K input symbols processed on 20 parallel cores compared to the KNN algorithm.

Download Full-text

High-Performance Parallel Implementation of Genetic Algorithm on FPGA

Circuits Systems and Signal Processing ◽

10.1007/s00034-019-01037-w ◽

2019 ◽

Vol 38 (9) ◽

pp. 4014-4039 ◽

Cited By ~ 7

Author(s):

Matheus F. Torquato ◽

Marcelo A. C. Fernandes

Keyword(s):

Genetic Algorithm ◽

High Performance ◽

Parallel Implementation

Download Full-text

Application of High Performance Parallel Computing Based on GPU

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.411-414.585 ◽

2013 ◽

Vol 411-414 ◽

pp. 585-588

Author(s):

Liu Yang ◽

Tie Ying Liu

Keyword(s):

Particle Swarm Optimization ◽

Parallel Computing ◽

Parallel Computation ◽

High Performance ◽

Search Process ◽

Search Rate ◽

Swarm Optimization ◽

Path Search ◽

Parallel Feature ◽

Time And Space Complexity

This paper introduces parallel feature of the GPU, which will help GPU parallel computation methods to achieve the parallelization of PSO parallel path search process; and reduce the increasingly high problem of PSO (PSO: Particle Swarm Optimization) in time and space complexity. The experimental results show: comparing with CPU mode, GPU platform calculation improves the search rate and shortens the calculation time.

Download Full-text

Scientific Programming with High Performance Fortran: A Case Study Using the xHPF Compiler

Scientific Programming ◽

10.1155/1997/528513 ◽

1997 ◽

Vol 6 (1) ◽

pp. 127-152

Author(s):

Eric De Sturler ◽

Volker Strumpen

Keyword(s):

High Performance ◽

Parallel Implementation ◽

Gaussian Elimination ◽

Primary Objective ◽

Matrix Product ◽

Dense Matrix ◽

High Performance Fortran ◽

Partial Pivoting ◽

Intel Paragon

Recently, the first commercial High Performance Fortran (HPF) subset compilers have appeared. This article reports on our experiences with the xHPF compiler of Applied Parallel Research, version 1.2, for the Intel Paragon. At this stage, we do not expect very High Performance from our HPF programs, even though performance will eventually be of paramount importance for the acceptance of HPF. Instead, our primary objective is to study how to convert large Fortran 77 (F77) programs to HPF such that the compiler generates reasonably efficient parallel code. We report on a case study that identifies several problems when parallelizing code with HPF; most of these problems affect current HPF compiler technology in general, although some are specific for the xHPF compiler. We discuss our solutions from the perspective of the scientific programmer, and presenttiming results on the Intel Paragon. The case study comprises three programs of different complexity with respect to parallelization. We use the dense matrix-matrix product to show that the distribution of arrays and the order of nested loops significantly influence the performance of the parallel program. We use Gaussian elimination with partial pivoting to study the parallelization strategy of the compiler. There are various ways to structure this algorithm for a particular data distribution. This example shows how much effort may be demanded from the programmer to support the compiler in generating an efficient parallel implementation. Finally, we use a small application to show that the more complicated structure of a larger program may introduce problems for the parallelization, even though all subroutines of the application are easy to parallelize by themselves. The application consists of a finite volume discretization on a structured grid and a nested iterative solver. Our case study shows that it is possible to obtain reasonably efficient parallel programs with xHPF, although the compiler needs substantial support from the programmer.

Download Full-text

High Performance Small RNA Detection with Pipelined Task Parallel Computation Model

Euro-Par 2016: Parallel Processing Workshops - Lecture Notes in Computer Science ◽

10.1007/978-3-319-58943-5_29 ◽

2017 ◽

pp. 359-371

Author(s):

Linqiang Ouyang ◽

Jin H. Park

Keyword(s):

Parallel Computation ◽

Small Rna ◽

High Performance ◽

Computation Model ◽

Rna Detection ◽

Task Parallel ◽

Parallel Computation Model

Download Full-text

Space-time parallel computation for time-domain Maxwell's equations

2017 International Conference on Electromagnetics in Advanced Applications (ICEAA) ◽

10.1109/iceaa.2017.8065615 ◽

2017 ◽

Author(s):

Shu Wang ◽

Zhen Peng

Keyword(s):

Parallel Computation ◽

Time Domain ◽

Maxwell’S Equations ◽

Maxwell's Equations ◽

Space Time

Download Full-text

Implicit Space-Time Domain Decomposition Methods for Stochastic Parabolic Partial Differential Equations

SIAM Journal on Scientific Computing ◽

10.1137/12090410x ◽

2014 ◽

Vol 36 (1) ◽

pp. C1-C24 ◽

Cited By ~ 6

Author(s):

Cui Cong ◽

Xiao-Chuan Cai ◽

Karl Gustafson

Keyword(s):

Partial Differential Equations ◽

Differential Equations ◽

Domain Decomposition ◽

Time Domain ◽

Decomposition Methods ◽

Space Time ◽

Parabolic Partial Differential Equations ◽

Domain Decomposition Methods ◽

Partial Differential

Download Full-text