bulk synchronous parallel
Recently Published Documents


TOTAL DOCUMENTS

116
(FIVE YEARS 10)

H-INDEX

11
(FIVE YEARS 2)

2021 ◽  
Author(s):  
Xing Zhao ◽  
Manos Papagelis ◽  
Aijun An ◽  
Bao Xin Chen ◽  
Junfeng Liu ◽  
...  

2021 ◽  
Author(s):  
Xing Zhao ◽  
Manos Papagelis ◽  
Aijun An ◽  
Bao Xin Chen ◽  
Junfeng Liu ◽  
...  

2021 ◽  
Vol 11 (11) ◽  
pp. 4785
Author(s):  
Yingchi Mao ◽  
Zijian Tu ◽  
Fagang Xi ◽  
Qingyong Wang ◽  
Shufang Xu

The rapid development of artificial intelligence technology has made deep neural networks (DNNs) widely used in various fields. DNNs have been continuously growing in order to improve the accuracy and quality of the models. Moreover, traditional data/model parallelism is hard to expand due to communication bottlenecks and hardware efficiency issues. However, pipeline parallelism trains multiple batches, reducing training overheads, so that it can achieve better acceleration effect. Considering the complexity of solving the pipeline parallel task allocation problem in heterogeneous computing resources, in this paper, a task allocation in pipeline parallelism (TAPP) based on deep reinforcement learning, is proposed. In TAPP, the predictive network is trained by a policy gradient until it obtains the optimal pipeline parallel task allocation scheme and speeds up the model training. Experimental results show that, on average, the single-step training time of TAPP is decreased by 1.37 times and the proportion of communication time is reduced by 48.92%, compared with the data parallelism, bulk synchronous parallel (BSP).


Author(s):  
Rob H. Bisseling

This chapter is a self-contained tutorial which tells you how to get started with parallel programming and how to design and implement parallel algorithms in a structured way using supersteps. It introduces a simple target architecture for designing parallel algorithms, the bulk synchronous parallel (BSP) computer. Using the computation of the inner product of two vectors as an example, the chapter shows how an algorithm is designed, hand in hand with its cost analysis. The inner-product algorithm is implemented in a short program that demonstrates the most important primitives of the communication library, BSPlib. Furthermore, a benchmarking program is given for measuring the BSP parameters of a parallel computer. Its use is demonstrated on a desktop computer and a supercomputer. Finally, a parallel regular sampling sort algorithm is presented, implemented, and tested.


Author(s):  
Rob H. Bisseling

This book explains how to use the bulk synchronous parallel (BSP) model to design and implement parallel algorithms in the areas of scientific computing and big data. Furthermore, it presents a hybrid BSP approach towards new hardware developments such as hierarchical architectures with both shared and distributed memory. The book provides a full treatment of core problems in scientific computing and big data, starting from a high-level problem description, via a sequential solution algorithm to a parallel solution algorithm and an actual parallel program written in the communication library BSPlib. Numerical experiments are presented for parallel programs on modern parallel computers ranging from desktop computers to massively parallel supercomputers. The introductory chapter of the book gives a complete overview of BSPlib, so that the reader already at an early stage is able to write his/her own parallel programs. Furthermore, it treats BSP benchmarking and parallel sorting by regular sampling. The next three chapters treat basic numerical linear algebra problems such as linear system solving by LU decomposition, sparse matrix-vector multiplication (SpMV), and the fast Fourier transform (FFT). The final chapter explores parallel algorithms for big data problems such as graph matching. The book is accompanied by a software package BSPedupack, freely available online from the author’s homepage, which contains all programs of the book and a set of test programs.


Author(s):  
Xing Zhao ◽  
Manos Papagelis ◽  
Aijun An ◽  
Bao Xin Chen ◽  
Junfeng Liu ◽  
...  

2019 ◽  
Vol 184 ◽  
pp. 102319 ◽  
Author(s):  
Flavio Ferrarotti ◽  
Senén González ◽  
Klaus-Dieter Schewe

2019 ◽  
Vol 19 (5-6) ◽  
pp. 1056-1072 ◽  
Author(s):  
ARIYAM DAS ◽  
CARLO ZANIOLO

AbstractA large class of traditional graph and data mining algorithms can be concisely expressed in Datalog, and other Logic-based languages, once aggregates are allowed in recursion. In fact, for most BigData algorithms, the difficult semantic issues raised by the use of non-monotonic aggregates in recursion are solved byPre-Mappability(${\cal P}$reM), a property that assures that for a program with aggregates in recursion there is an equivalent aggregate-stratified program. In this paper we show that, by bringing together the formal abstract semantics of stratified programs with the efficient operational one of unstratified programs,$\[{\cal P}\]$reMcan also facilitate and improve their parallel execution. We prove that$\[{\cal P}\]$reM-optimized lock-free and decomposable parallel semi-naive evaluations produce the same results as the single executor programs. Therefore,$\[{\cal P}\]$reMcan be assimilated into the data-parallel computation plans of different distributed systems, irrespective of whether these follow bulk synchronous parallel (BSP) or asynchronous computing models. In addition, we show that non-linear recursive queries can be evaluated using a hybrid stale synchronous parallel (SSP) model on distributed environments. After providing a formal correctness proof for the recursive query evaluation with$\[{\cal P}\]$reMunder this relaxed synchronization model, we present experimental evidence of its benefits.


2019 ◽  
Vol 9 (3) ◽  
pp. 4112-4115
Author(s):  
C. L. Vidal-Silva ◽  
E. Madariaga ◽  
T. Pham ◽  
J. M. Rubio ◽  
L. A. Urzua ◽  
...  

This article presents a comparison of the computing performance of the MapReduce tool Hadoop and Giraph on large-scale graphs. The main ideas of MapReduce and bulk synchronous parallel (BSP) are reviewed as big data computing approaches to highlight their applicability in large-scale graph processing. This paper reviews the execution performance of Hadoop and Giraph on the PageRank algorithm to classify web pages according to their relevance, and on a few other algorithms to find the minimum spanning tree in a graph with the primary goal of finding the most efficient computing approach to work on large-scale graphs. Experimental results show that the use of Giraph for processing large-size graphs reduces the execution time by 25% in comparison with the results obtained using the Hadoop for the same experiments. Giraph represents the optimal option thanks to its in-memory computing approach that avoids secondary memory direct interaction.


Sign in / Sign up

Export Citation Format

Share Document