Area Optimization of Slicing Floorplans in Parallel

This paper presents a new sequential algorithm to answer the question about the existence of a causal explanation for a set of independence statements (a dependency model), which is consistent with a given set of background knowledge. Emphasis is placed on generality, efficiency and ease of parallelization of the algorithm. From this sequential algorithm, an efficient, scalable, and easy to implement parallel algorithm with very little inter-processor communication is derived.

Download Full-text

Analysis of a Step-Based Watershed Algorithm Using CUDA

International Journal of Natural Computing Research ◽

10.4018/jncr.2010100102 ◽

2010 ◽

Vol 1 (4) ◽

pp. 16-28 ◽

Cited By ~ 2

Author(s):

Giovani Bernardes Vitor ◽

André Körbes ◽

Roberto de Alencar Lotufo ◽

Janito Vaqueiro Ferreira

Keyword(s):

Parallel Algorithms ◽

Parallel Algorithm ◽

Execution Time ◽

Sequential Algorithm ◽

Graphics Hardware ◽

Watershed Algorithm ◽

Sequential Approach ◽

Watershed Transform ◽

Average Speed ◽

Hybrid Approaches

This paper proposes and develops a parallel algorithm for the watershed transform, with application on graphics hardware. The existing proposals are discussed and its aspects briefly analysed. The algorithm is proposed as a procedure of four steps, where each step performs a task using different approaches inspired by existing techniques. The algorithm is implemented using the CUDA libraries and its performance is measured on the GPU and compared to a sequential algorithm running on the CPU, achieving an average speed of twice the execution time of the sequential approach. This work improves on previous results of hybrid approaches and parallel algorithms with many steps of synchronisation and iterations between CPU and GPU.

Download Full-text

PARALLEL ALGORITHM FOR SECOND–ORDER RESTRICTED WEAK INTEGER COMPOSITION GENERATION FOR SHARED MEMORY MACHINES

Parallel Processing Letters ◽

10.1142/s0129626413500102 ◽

2013 ◽

Vol 23 (03) ◽

pp. 1350010 ◽

Cited By ~ 1

Author(s):

DANIEL R. PAGE

Keyword(s):

Load Balancing ◽

Parallel Algorithm ◽

Shared Memory ◽

Second Order ◽

Sequential Algorithm ◽

Generation Algorithm ◽

Combinatorial Generation ◽

Order Restricted

In 2012, Page presented a sequential combinatorial generation algorithm for generalized types of restricted weak integer compositions called second–order restricted weak integer compositions. Second–order restricted weak integer compositions cover various types of restricted weak integer compositions of n parts such as integer compositions, bounded compositions, and part–wise integer compositions. In this paper, we present a parallel algorithm that derives from our parallelization of Page's sequential algorithm with a focus on load balancing for shared memory machines.

Download Full-text

Analysis of a Step-Based Watershed Algorithm Using CUDA

Nature-Inspired Computing Design, Development, and Applications ◽

10.4018/978-1-4666-1574-8.ch018 ◽

2012 ◽

pp. 321-335 ◽

Cited By ~ 2

Author(s):

Giovani Bernardes Vitor ◽

André Körbes ◽

Roberto de Alencar Lotufo ◽

Janito Vaqueiro Ferreira

Keyword(s):

Parallel Algorithms ◽

Parallel Algorithm ◽

Execution Time ◽

Sequential Algorithm ◽

Graphics Hardware ◽

Watershed Algorithm ◽

Sequential Approach ◽

Watershed Transform ◽

Average Speed ◽

Hybrid Approaches

This paper proposes and develops a parallel algorithm for the watershed transform, with application on graphics hardware. The existing proposals are discussed and its aspects briefly analysed. The algorithm is proposed as a procedure of four steps, where each step performs a task using different approaches inspired by existing techniques. The algorithm is implemented using the CUDA libraries and its performance is measured on the GPU and compared to a sequential algorithm running on the CPU, achieving an average speed of twice the execution time of the sequential approach. This work improves on previous results of hybrid approaches and parallel algorithms with many steps of synchronisation and iterations between CPU and GPU.

Download Full-text

A Multi-Branch-and-Bound Binary Parallel Algorithm to Solve the Knapsack Problem 0–1 in a Multicore Cluster

Applied Sciences ◽

10.3390/app9245368 ◽

2019 ◽

Vol 9 (24) ◽

pp. 5368 ◽

Cited By ~ 1

Author(s):

José Crispín Zavala-Díaz ◽

Marco Antonio Cruz-Chávez ◽

Jacqueline López-Calderón ◽

José Alberto Hernández-Aguilar ◽

Martha Elena Luna-Ortíz

Keyword(s):

Exact Solution ◽

Parallel Algorithms ◽

Parallel Algorithm ◽

Branch And Bound ◽

Knapsack Problem ◽

Multicore Processors ◽

Sequential Algorithm ◽

Branch And Bound Algorithms ◽

Superlinear Speedup

This paper presents a process that is based on sets of parts, where elements are fixed and removed to form different binary branch-and-bound (BB) trees, which in turn are used to build a parallel algorithm called “multi-BB”. These sequential and parallel algorithms calculate the exact solution for the 0–1 knapsack problem. The sequential algorithm solves the instances published by other researchers (and the proposals by Pisinger) to solve the not-so-complex (uncorrelated) class and some problems of the medium-complex (weakly correlated) class. The parallel algorithm solves the problems that cannot be solved with the sequential algorithm of the weakly correlated class in a cluster of multicore processors. The multi-branch-and-bound algorithms obtained parallel efficiencies of approximately 75%, but in some cases, it was possible to obtain a superlinear speedup.

Download Full-text

Practical Wavelet Tree Construction

Journal of Experimental Algorithmics ◽

10.1145/3457197 ◽

2021 ◽

Vol 26 ◽

pp. 1-67

Author(s):

Patrick Dinklage ◽

Jonas Ellert ◽

Johannes Fischer ◽

Florian Kurpicz ◽

Marvin Löbel

Keyword(s):

Parallel Algorithms ◽

Shared Memory ◽

Distributed Memory ◽

Auxiliary Information ◽

Parallel Computers ◽

External Memory ◽

Sequential Algorithms ◽

Bottom Up ◽

Memory Efficiency ◽

Tree Construction

We present new sequential and parallel algorithms for wavelet tree construction based on a new bottom-up technique. This technique makes use of the structure of the wavelet trees—refining the characters represented in a node of the tree with increasing depth—in an opposite way, by first computing the leaves (most refined), and then propagating this information upwards to the root of the tree. We first describe new sequential algorithms, both in RAM and external memory. Based on these results, we adapt these algorithms to parallel computers, where we address both shared memory and distributed memory settings. In practice, all our algorithms outperform previous ones in both time and memory efficiency, because we can compute all auxiliary information solely based on the information we obtained from computing the leaves. Most of our algorithms are also adapted to the wavelet matrix , a variant that is particularly suited for large alphabets.

Download Full-text

An Efficient Parallel Algorithm for Rectangular Packing Based on Bintree Expression

Volume 3: 22nd Design Automation Conference ◽

10.1115/96-detc/dac-1043 ◽

1996 ◽

Author(s):

Aihu Wang ◽

Jianzhong Cha ◽

Jinmin Wang

Keyword(s):

Computational Complexity ◽

Problem Solving ◽

Parallel Algorithm ◽

Shared Memory ◽

Optimal Packing ◽

Packing Process ◽

Left Corner ◽

Rectangular Packing ◽

Packing Scheme ◽

Sequential Decomposition

Abstract In this paper, a method using bintree structure to express the states of the packing space of rectangular packing is proposed. Through the sequential decomposition of the packing space, the optimal packing scheme of various sized rectangular packing can be obtained by every time putting the optimal piece that satisfies specular conditions toward the current packing space and by locating it at the up-left corner of the current packing space. Different optimal packing schemes that satisfy different demands can be obtained by adjusting the values of the ordering factors KA and KB. A parallel algorithm based on SIMD-CREW shared-memory computer is designed through the analysis of the parallelism of the bintree expression. The whole packing process is clearly expressed by the bintree. The computational complexity of the algorithm is shown to be O(n2logn). Both the experimental results and the comparison with other sequential packing algorithms have proved that the parallel packing algorithm is efficient. What is more, it nearly doubles the problem solving speed.

Download Full-text

Parallel algorithms for LU decomposition on a shared memory multiprocessor

Applied Mathematics and Computation ◽

10.1016/j.amc.2004.01.027 ◽

2005 ◽

Vol 163 (1) ◽

pp. 179-191 ◽

Cited By ~ 7

Author(s):

Dogan Kaya ◽

Ken Wright

Keyword(s):

Parallel Algorithms ◽

Shared Memory ◽

Lu Decomposition ◽

Shared Memory Multiprocessor

Download Full-text

Parallel Strategy to Factorize Fermat Numbers with Implementation in Maple Software

Journal of Software ◽

10.17706/jsw.16.4.167-173 ◽

2021 ◽

pp. 167-173

Author(s):

Jianhui Li ◽

◽

Manlan Liu

Keyword(s):

Parallel Computing ◽

Parallel Algorithm ◽

Computational Efficiency ◽

Sequential Algorithm ◽

Parallel Processes ◽

Fermat Numbers ◽

Fermat Number ◽

Parallel Strategy

In accordance with the traits of parallel computing, the paper proposes a parallel algorithm to factorize the Fermat numbers through parallelization of a sequential algorithm. The kernel work to parallelize a sequential algorithm is presented by subdividing the computing interval into subintervals that are assigned to the parallel processes to perform the parallel computing. Maple experiments show that the parallelization increases the computational efficiency of factoring the Fermat numbers, especially to the Fermat number with big divisors.

Download Full-text

A Shared Memory Parallel Algorithm for Logic Synthesis

The Sixth International Conference on VLSI Design ◽

10.1109/icvd.1993.669703 ◽

2005 ◽

Cited By ~ 1

Author(s):

Chieng-Fai Lim ◽

P. Banerjee ◽

K. De ◽

S. Muroga

Keyword(s):

Parallel Algorithm ◽

Shared Memory ◽

Logic Synthesis

Download Full-text