Iteratively solving sparse linear system based on PaRSEC task scheduling

With the new architecture and new programming paradigms such as task-based scheduling emerging in the parallel high performance computing area, it is of great importance to utilize these features to tune the monolithic computing codes. In this article, the classical conjugate gradient algorithms targeting at sparse linear system Ax = b in Krylov subspace are pipelining to execute interdependent tasks on Parallel Runtime Scheduling and Execution Controller (PaRSEC) runtime. Firstly, the sparse matrix A is split in rows to unfold more coarse-grained parallelism. Secondly, the partitioned sub-vectors are not assembled into one full vector in RAM to run sparse matrix–vector product (SpMV) operations for eliminating the communication overhead. Moreover, in the SpMV computation, if all elements of one column in the split sub-matrix are zeros, the corresponding product operations of these elements may be removed by reorganizing sub-vectors. Finally, the latency of migrating sub-vector is partially overlapped by the duration of performing SpMV operations through the further splitting in columns of sparse matrix on GPUs. In experiments, a series of tests demonstrate that optimal speedup and higher pipelining efficiency has been achieved for the pipelined task scheduling on PaRSEC runtime. Fusing SpMV concurrency and dot product pipelining can achieve higher speedup and efficiency.

Download Full-text

Numerical Evaluations of Parallelization Efficiencies of Communication Avoiding Krylov Subspace Method for Large Sparse Linear System

International Conference on Computational & Experimental Engineering and Sciences ◽

10.32604/icces.2019.05496 ◽

2019 ◽

Vol 21 (2) ◽

pp. 43-43

Author(s):

Akira Matsumoto ◽

Taku Itoh ◽

Soichiro Ikuno

Keyword(s):

Linear System ◽

Krylov Subspace ◽

Krylov Subspace Method ◽

Subspace Method ◽

Sparse Linear System

Download Full-text

The Combinatorial BLAS: design, implementation, and applications

The International Journal of High Performance Computing Applications ◽

10.1177/1094342011403516 ◽

2011 ◽

Vol 25 (4) ◽

pp. 496-509 ◽

Cited By ~ 187

Author(s):

Aydın Buluç ◽

John R Gilbert

Keyword(s):

Data Mining ◽

High Performance ◽

Web Search ◽

Sparse Matrix ◽

Ease Of Use ◽

Coarse Grained ◽

Matrix Methods ◽

The Right ◽

Traditional Approaches ◽

Combinatorial Graphs

This paper presents a scalable high-performance software library to be used for graph analysis and data mining. Large combinatorial graphs appear in many applications of high-performance computing, including computational biology, informatics, analytics, web search, dynamical systems, and sparse matrix methods. Graph computations are difficult to parallelize using traditional approaches due to their irregular nature and low operational intensity. Many graph computations, however, contain sufficient coarse-grained parallelism for thousands of processors, which can be uncovered by using the right primitives. We describe the parallel Combinatorial BLAS, which consists of a small but powerful set of linear algebra primitives specifically targeting graph and data mining applications. We provide an extensible library interface and some guiding principles for future development. The library is evaluated using two important graph algorithms, in terms of both performance and ease-of-use. The scalability and raw performance of the example applications, using the Combinatorial BLAS, are unprecedented on distributed memory clusters.

Download Full-text

High-Performance Fortran and possible extensions to support conjugate gradient algorithms

Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing HPDC-96 ◽

10.1109/hpdc.1996.546175 ◽

1996 ◽

Author(s):

K. Dincer ◽

G.C. Fox ◽

K. Hawick

Keyword(s):

Conjugate Gradient ◽

High Performance ◽

Conjugate Gradient Algorithms ◽

High Performance Fortran ◽

Gradient Algorithms

Download Full-text

A cooperative DDoS attack detection scheme based on entropy and ensemble learning in SDN

EURASIP Journal on Wireless Communications and Networking ◽

10.1186/s13638-021-01957-9 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Shanshan Yu ◽

Jicheng Zhang ◽

Ju Liu ◽

Xiaoqing Zhang ◽

Yafeng Li ◽

...

Keyword(s):

Ensemble Learning ◽

Denial Of Service ◽

Attack Detection ◽

Coarse Grained ◽

Communication Overhead ◽

Detection Scheme ◽

Fine Grained ◽

Ddos Attack ◽

Network Status ◽

Ddos Attack Detection

AbstractIn order to solve the problem of distributed denial of service (DDoS) attack detection in software-defined network, we proposed a cooperative DDoS attack detection scheme based on entropy and ensemble learning. This method sets up a coarse-grained preliminary detection module based on entropy in the edge switch to monitor the network status in real time and report to the controller if any abnormality is found. Simultaneously, a fine-grained precise attack detection module is designed in the controller, and a ensemble learning-based algorithm is utilized to further identify abnormal traffic accurately. In this framework, the idle computing capability of edge switches is fully utilized with the design idea of edge computing to offload part of the detection task from the control plane to the data plane innovatively. Simulation results of two common DDoS attack methods, ICMP and SYN, show that the system can effectively detect DDoS attacks and greatly reduce the southbound communication overhead and the burden of the controller as well as the detection delay of the attacks.

Download Full-text

Coupling multi-level component interfaces for parallel sparse linear system solvers

Proceedings of the 2009 Workshop on Component-Based High Performance Computing - CBHPC '09 ◽

10.1145/1687774.1687779 ◽

2009 ◽

Author(s):

Fang Liu ◽

Masha Sosonkina ◽

Dane Coffey

Keyword(s):

Linear System ◽

Sparse Linear System ◽

Multi Level

Download Full-text

Termination and equivalence results for conjugate gradient algorithms

Mathematical Programming ◽

10.1007/bf02591730 ◽

1984 ◽

Vol 29 (1) ◽

pp. 64-76

Author(s):

A. Buckley

Keyword(s):

Conjugate Gradient ◽

Conjugate Gradient Algorithms ◽

Gradient Algorithms

Download Full-text

Optimization procedure for algorithms of task scheduling in high performance heterogeneous distributed computing systems

Egyptian Informatics Journal ◽

10.1016/j.eij.2011.10.001 ◽

2011 ◽

Vol 12 (3) ◽

pp. 219-229 ◽

Cited By ~ 5

Author(s):

Nirmeen A. Bahnasawy ◽

Fatma Omara ◽

Magdy A. Koutb ◽

Mervat Mosa

Keyword(s):

Distributed Computing ◽

Task Scheduling ◽

High Performance ◽

Optimization Procedure ◽

Distributed Computing Systems ◽

Computing Systems ◽

Heterogeneous Distributed Computing ◽

Heterogeneous Distributed Computing Systems

Download Full-text

Efficient generalized conjugate gradient algorithms, part 1: Theory

Journal of Optimization Theory and Applications ◽

10.1007/bf00940464 ◽

1991 ◽

Vol 69 (1) ◽

pp. 129-137 ◽

Cited By ~ 222

Author(s):

Y. Liu ◽

C. Storey

Keyword(s):

Conjugate Gradient ◽

Conjugate Gradient Algorithms ◽

Gradient Algorithms

Download Full-text

A Survey Paper on Task Scheduling Methods in Cluster Computing Environment for High Performance

2015 Fifth International Conference on Advanced Computing & Communication Technologies ◽

10.1109/acct.2015.64 ◽

2015 ◽

Author(s):

Harvinder Singh ◽

Gurdev Singh

Keyword(s):

Task Scheduling ◽

High Performance ◽

Cluster Computing ◽

Computing Environment ◽

Survey Paper

Download Full-text

You Only Traverse Twice: A YOTT Placement, Routing, and Timing Approach for CGRAs

ACM Transactions on Embedded Computing Systems ◽

10.1145/3477038 ◽

2021 ◽

Vol 20 (5s) ◽

pp. 1-25

Author(s):

Michael Canesche ◽

Westerley Carvalho ◽

Lucas Reis ◽

Matheus Oliveira ◽

Salles Magalhães ◽

...

Keyword(s):

Execution Time ◽

High Performance ◽

Coarse Grained ◽

Optimal Placement ◽

Greedy Heuristics ◽

High Quality ◽

Solution Quality ◽

Graph Traversal ◽

Trade Offs ◽

Graph Properties

Coarse-grained reconfigurable architecture (CGRA) mapping involves three main steps: placement, routing, and timing. The mapping is an NP-complete problem, and a common strategy is to decouple this process into its independent steps. This work focuses on the placement step, and its aim is to propose a technique that is both reasonably fast and leads to high-performance solutions. Furthermore, a near-optimal placement simplifies the following routing and timing steps. Exact solutions cannot find placements in a reasonable execution time as input designs increase in size. Heuristic solutions include meta-heuristics, such as Simulated Annealing (SA) and fast and straightforward greedy heuristics based on graph traversal. However, as these approaches are probabilistic and have a large design space, it is not easy to provide both run-time efficiency and good solution quality. We propose a graph traversal heuristic that provides the best of both: high-quality placements similar to SA and the execution time of graph traversal approaches. Our placement introduces novel ideas based on “you only traverse twice” (YOTT) approach that performs a two-step graph traversal. The first traversal generates annotated data to guide the second step, which greedily performs the placement, node per node, aided by the annotated data and target architecture constraints. We introduce three new concepts to implement this technique: I/O and reconvergence annotation, degree matching, and look-ahead placement. Our analysis of this approach explores the placement execution time/quality trade-offs. We point out insights on how to analyze graph properties during dataflow mapping. Our results show that YOTT is 60.6 , 9.7 , and 2.3 faster than a high-quality SA, bounding box SA VPR, and multi-single traversal placements, respectively. Furthermore, YOTT reduces the average wire length and the maximal FIFO size (additional timing requirement on CGRAs) to avoid delay mismatches in fully pipelined architectures.

Download Full-text