bulk synchronous parallel Latest Research Papers

The rapid development of artificial intelligence technology has made deep neural networks (DNNs) widely used in various fields. DNNs have been continuously growing in order to improve the accuracy and quality of the models. Moreover, traditional data/model parallelism is hard to expand due to communication bottlenecks and hardware efficiency issues. However, pipeline parallelism trains multiple batches, reducing training overheads, so that it can achieve better acceleration effect. Considering the complexity of solving the pipeline parallel task allocation problem in heterogeneous computing resources, in this paper, a task allocation in pipeline parallelism (TAPP) based on deep reinforcement learning, is proposed. In TAPP, the predictive network is trained by a policy gradient until it obtains the optimal pipeline parallel task allocation scheme and speeds up the model training. Experimental results show that, on average, the single-step training time of TAPP is decreased by 1.37 times and the proportion of communication time is reduced by 48.92%, compared with the data parallelism, bulk synchronous parallel (BSP).

Download Full-text

Introduction

Parallel Scientific Computation ◽

10.1093/oso/9780198788348.003.0001 ◽

2020 ◽

pp. 1-73

Author(s):

Rob H. Bisseling

Keyword(s):

Cost Analysis ◽

Parallel Algorithms ◽

Parallel Computer ◽

Inner Product ◽

Desktop Computer ◽

Bulk Synchronous Parallel ◽

Target Architecture ◽

Short Program ◽

Regular Sampling ◽

Sort Algorithm

This chapter is a self-contained tutorial which tells you how to get started with parallel programming and how to design and implement parallel algorithms in a structured way using supersteps. It introduces a simple target architecture for designing parallel algorithms, the bulk synchronous parallel (BSP) computer. Using the computation of the inner product of two vectors as an example, the chapter shows how an algorithm is designed, hand in hand with its cost analysis. The inner-product algorithm is implemented in a short program that demonstrates the most important primitives of the communication library, BSPlib. Furthermore, a benchmarking program is given for measuring the BSP parameters of a parallel computer. Its use is demonstrated on a desktop computer and a supercomputer. Finally, a parallel regular sampling sort algorithm is presented, implemented, and tested.

Download Full-text

Parallel Scientific Computation

10.1093/oso/9780198788348.001.0001 ◽

2020 ◽

Author(s):

Rob H. Bisseling

Keyword(s):

Big Data ◽

Parallel Algorithms ◽

Scientific Computing ◽

Graph Matching ◽

Sparse Matrix ◽

Early Stage ◽

Numerical Linear Algebra ◽

Solution Algorithm ◽

Parallel Programs ◽

Bulk Synchronous Parallel

This book explains how to use the bulk synchronous parallel (BSP) model to design and implement parallel algorithms in the areas of scientific computing and big data. Furthermore, it presents a hybrid BSP approach towards new hardware developments such as hierarchical architectures with both shared and distributed memory. The book provides a full treatment of core problems in scientific computing and big data, starting from a high-level problem description, via a sequential solution algorithm to a parallel solution algorithm and an actual parallel program written in the communication library BSPlib. Numerical experiments are presented for parallel programs on modern parallel computers ranging from desktop computers to massively parallel supercomputers. The introductory chapter of the book gives a complete overview of BSPlib, so that the reader already at an early stage is able to write his/her own parallel programs. Furthermore, it treats BSP benchmarking and parallel sorting by regular sampling. The next three chapters treat basic numerical linear algebra problems such as linear system solving by LU decomposition, sparse matrix-vector multiplication (SpMV), and the fast Fourier transform (FFT). The final chapter explores parallel algorithms for big data problems such as graph matching. The book is accompanied by a software package BSPedupack, freely available online from the author’s homepage, which contains all programs of the book and a set of test programs.

Download Full-text

Elastic Bulk Synchronous Parallel Model for Distributed Deep Learning

2019 IEEE International Conference on Data Mining (ICDM) ◽

10.1109/icdm.2019.00198 ◽

2019 ◽

Author(s):

Xing Zhao ◽

Manos Papagelis ◽

Aijun An ◽

Bao Xin Chen ◽

Junfeng Liu ◽

...

Keyword(s):

Deep Learning ◽

Parallel Model ◽

Bulk Synchronous Parallel

Download Full-text

BSP abstract state machines capture bulk synchronous parallel computations

Science of Computer Programming ◽

10.1016/j.scico.2019.102319 ◽

2019 ◽

Vol 184 ◽

pp. 102319 ◽

Cited By ~ 2

Author(s):

Flavio Ferrarotti ◽

Senén González ◽

Klaus-Dieter Schewe

Keyword(s):

Parallel Computations ◽

State Machines ◽

Abstract State Machines ◽

Bulk Synchronous Parallel

Download Full-text

A Case for Stale Synchronous Distributed Model for Declarative Recursive Computation

Theory and Practice of Logic Programming ◽

10.1017/s1471068419000358 ◽

2019 ◽

Vol 19 (5-6) ◽

pp. 1056-1072 ◽

Cited By ~ 2

Author(s):

ARIYAM DAS ◽

CARLO ZANIOLO

Keyword(s):

Parallel Execution ◽

Distributed Model ◽

Correctness Proof ◽

Bulk Synchronous Parallel ◽

Data Parallel ◽

Data Mining Algorithms ◽

Recursive Query ◽

Recursive Computation ◽

Image Position ◽

Mining Algorithms

AbstractA large class of traditional graph and data mining algorithms can be concisely expressed in Datalog, and other Logic-based languages, once aggregates are allowed in recursion. In fact, for most BigData algorithms, the difficult semantic issues raised by the use of non-monotonic aggregates in recursion are solved byPre-Mappability(${\cal P}$reM), a property that assures that for a program with aggregates in recursion there is an equivalent aggregate-stratified program. In this paper we show that, by bringing together the formal abstract semantics of stratified programs with the efficient operational one of unstratified programs,$\[{\cal P}\]$reMcan also facilitate and improve their parallel execution. We prove that$\[{\cal P}\]$reM-optimized lock-free and decomposable parallel semi-naive evaluations produce the same results as the single executor programs. Therefore,$\[{\cal P}\]$reMcan be assimilated into the data-parallel computation plans of different distributed systems, irrespective of whether these follow bulk synchronous parallel (BSP) or asynchronous computing models. In addition, we show that non-linear recursive queries can be evaluated using a hybrid stale synchronous parallel (SSP) model on distributed environments. After providing a formal correctness proof for the recursive query evaluation with$\[{\cal P}\]$reMunder this relaxed synchronization model, we present experimental evidence of its benefits.

Download Full-text

Advantages of Giraph over Hadoop in Graph Processing

Engineering, Technology & Applied Science Research ◽

10.48084/etasr.2715 ◽

2019 ◽

Vol 9 (3) ◽

pp. 4112-4115

Author(s):

C. L. Vidal-Silva ◽

E. Madariaga ◽

T. Pham ◽

J. M. Rubio ◽

L. A. Urzua ◽

...

Keyword(s):

Large Scale ◽

Minimum Spanning Tree ◽

Direct Interaction ◽

Secondary Memory ◽

Web Pages ◽

Graph Processing ◽

Bulk Synchronous Parallel ◽

Large Size ◽

Main Ideas ◽

Computing Approach

This article presents a comparison of the computing performance of the MapReduce tool Hadoop and Giraph on large-scale graphs. The main ideas of MapReduce and bulk synchronous parallel (BSP) are reviewed as big data computing approaches to highlight their applicability in large-scale graph processing. This paper reviews the execution performance of Hadoop and Giraph on the PageRank algorithm to classify web pages according to their relevance, and on a few other algorithms to find the minimum spanning tree in a graph with the primary goal of finding the most efficient computing approach to work on large-scale graphs. Experimental results show that the use of Giraph for processing large-size graphs reduces the execution time by 25% in comparison with the results obtained using the Hadoop for the same experiments. Giraph represents the optimal option thanks to its in-memory computing approach that avoids secondary memory direct interaction.

Download Full-text