sparsity pattern
Recently Published Documents


TOTAL DOCUMENTS

67
(FIVE YEARS 15)

H-INDEX

13
(FIVE YEARS 3)

2021 ◽  
Vol 14 (4) ◽  
pp. 1-28
Author(s):  
Tao Yang ◽  
Zhezhi He ◽  
Tengchuan Kou ◽  
Qingzheng Li ◽  
Qi Han ◽  
...  

Field-programmable Gate Array (FPGA) is a high-performance computing platform for Convolution Neural Networks (CNNs) inference. Winograd algorithm, weight pruning, and quantization are widely adopted to reduce the storage and arithmetic overhead of CNNs on FPGAs. Recent studies strive to prune the weights in the Winograd domain, however, resulting in irregular sparse patterns and leading to low parallelism and reduced utilization of resources. Besides, there are few works to discuss a suitable quantization scheme for Winograd. In this article, we propose a regular sparse pruning pattern in the Winograd-based CNN, namely, Sub-row-balanced Sparsity (SRBS) pattern, to overcome the challenge of the irregular sparse pattern. Then, we develop a two-step hardware co-optimization approach to improve the model accuracy using the SRBS pattern. Based on the pruned model, we implement a mixed precision quantization to further reduce the computational complexity of bit operations. Finally, we design an FPGA accelerator that takes both the advantage of the SRBS pattern to eliminate low-parallelism computation and the irregular memory accesses, as well as the mixed precision quantization to get a layer-wise bit width. Experimental results on VGG16/VGG-nagadomi with CIFAR-10 and ResNet-18/34/50 with ImageNet show up to 11.8×/8.67× and 8.17×/8.31×/10.6× speedup, 12.74×/9.19× and 8.75×/8.81×/11.1× energy efficiency improvement, respectively, compared with the state-of-the-art dense Winograd accelerator [20] with negligible loss of model accuracy. We also show that our design has 4.11× speedup compared with the state-of-the-art sparse Winograd accelerator [19] on VGG16.


Fluids ◽  
2021 ◽  
Vol 6 (10) ◽  
pp. 355
Author(s):  
Timur Imankulov ◽  
Danil Lebedev ◽  
Bazargul Matkerim ◽  
Beimbet Daribayev ◽  
Nurislam Kassymbek

Newton’s method has been widely used in simulation multiphase, multicomponent flow in porous media. In addition, to solve systems of linear equations in such problems, the generalized minimal residual method (GMRES) is often used. This paper analyzed the one-dimensional problem of multicomponent fluid flow in a porous medium and solved the system of the algebraic equation with the Newton-GMRES method. We calculated the linear equations with the GMRES, the GMRES with restarts after every m steps—GMRES (m) and preconditioned with Incomplete Lower-Upper factorization, where the factors L and U have the same sparsity pattern as the original matrix—the ILU(0)-GMRES algorithms, respectively, and compared the computation time and convergence. In the course of the research, the influence of the preconditioner and restarts of the GMRES (m) algorithm on the computation time was revealed; in particular, they were able to speed up the program.


Author(s):  
Andreas Dedner ◽  
Tristan Pryer

AbstractWe extend the finite element method introduced by Lakkis and Pryer (SIAM J. Sci. Comput. 33(2): 786–801, 2011) to approximate the solution of second-order elliptic problems in nonvariational form to incorporate the discontinuous Galerkin (DG) framework. This is done by viewing the “finite element Hessian” as an auxiliary variable in the formulation. Representing the finite element Hessian in a discontinuous setting yields a linear system of the same size and having the same sparsity pattern of the compact DG methods for variational elliptic problems. Furthermore, the system matrix is very easy to assemble; thus, this approach greatly reduces the computational complexity of the discretisation compared to the continuous approach. We conduct a stability and consistency analysis making use of the unified framework set out in Arnold et al. (SIAM J. Numer. Anal. 39(5): 1749–1779, 2001/2002). We also give an a posteriori analysis of the method in the case where the problem has a strong solution. The analysis applies to any consistent representation of the finite element Hessian, and thus is applicable to the previous works making use of continuous Galerkin approximations. Numerical evidence is presented showing that the method works well also in a more general setting.


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Wenpeng Ma ◽  
Yiwen Hu ◽  
Wu Yuan ◽  
Xiazhen Liu

Solving sparse triangular systems is the building block for incomplete LU- (ILU-) based preconditioning, but parallel algorithms, such as the level-scheduling scheme, are sometimes limited by available parallelism extracted from the sparsity pattern. In this study, the block version of the incomplete sparse approximate inverses (ISAI) algorithm is studied, and the block-ISAI is considered for preconditioning by proposing an efficient algorithm and implementation on graphical processing unit (GPU) accelerators. Performance comparisons are carried out between the proposed algorithm and serial and parallel block triangular solvers from PETSc and cuSPARSE libraries. The experimental results show that GMRES (30) with the proposed block-ISAI preconditioning achieves accelerations 1.4 × –6.9 × speedups over that using the cuSPARSE library on NVIDIA Tesla V100 GPU.


Author(s):  
Mustafa D. Kaba ◽  
Mengnan Zhao ◽  
Rene Vidal ◽  
Daniel P. Robinson ◽  
Enrique Mallada

Author(s):  
Rob H. Bisseling

This chapter introduces irregular algorithms and presents the example of parallel sparse matrix-vector multiplication (SpMV), which is the central operation in iterative linear system solvers. The irregular sparsity pattern of the matrix does not change during the multiplication, which may be repeated many times. This justifies putting a lot of effort into finding a good data distribution. The Mondriaan distribution of a sparse matrix is a useful non-Cartesian distribution that can be found by hypergraph-based partitioning. The Mondriaan package implements such a partitioning and also the newer medium-grain partitioning method. The chapter analyses the special cases of random sparse matrices and Laplacian matrices. It uses performance profiles and geometric means to compare different partitioning methods. Furthermore, it presents the hybrid-BSP model and a hybrid-BSP SpMV, which are aimed at hybrid distributed/shared-memory architectures. The parallel SpMV can be incorporated in applications, ranging from PageRank computation to artificial neural networks.


Sign in / Sign up

Export Citation Format

Share Document