Autotuning Parallel Programs by Model Checking

The paper presents a new approach to autotuning data-parallel programs. Autotuning is a search for optimal program settings which maximize its performance. The novelty of the approach lies in the use of the model checking method to find the optimal tuning parameters by the method of counterexamples. In our work, we abstract from specific programs and specific processors by defining their representative abstract patterns. Our method of counterexamples implements the following four steps. At the first step, an execution model of an abstract program on an abstract processor is described in the language of a model checking tool. At the second step, in the language of the model checking tool, we formulate the optimality property that depends on the constructed model. At the third step, we find the optimal values of the tuning parameters by using a counterexample constructed during the verification of the optimality property. In the fourth step, we extract the information about the tuning parameters from the counter-example for the optimal parameters. We apply this approach to autotuning parallel programs written in OpenCL, a popular modern language that extends the C language for programming both standard multi-core processors (CPUs) and massively parallel graphics processing units (GPUs). As a verification tool, we use the SPIN verifier and its model representation language Promela, whose formal semantics is good for modelling the execution of parallel programs on processors with different architectures.

Download Full-text

A Low-Power Spike-Like Neural Network Design

Electronics ◽

10.3390/electronics8121479 ◽

2019 ◽

Vol 8 (12) ◽

pp. 1479 ◽

Cited By ~ 1

Author(s):

Michael Losh ◽

Daniel Llamocca

Keyword(s):

Neural Network ◽

Graphics Processing Units ◽

Hybrid Network ◽

Network Resource ◽

Performance Requirements ◽

Reduction Methods ◽

Sum Of Products ◽

Parallel Graphics ◽

Hardware Platforms ◽

Graphics Processing

Modern massively-parallel Graphics Processing Units (GPUs) and Machine Learning (ML) frameworks enable neural network implementations of unprecedented performance and sophistication. However, state-of-the-art GPU hardware platforms are extremely power-hungry, while microprocessors cannot achieve the performance requirements. Biologically-inspired Spiking Neural Networks (SNN) have inherent characteristics that lead to lower power consumption. We thus present a bit-serial SNN-like hardware architecture. By using counters, comparators, and an indexing scheme, the design effectively implements the sum-of-products inherent in neurons. In addition, we experimented with various strength-reduction methods to lower neural network resource usage. The proposed Spiking Hybrid Network (SHiNe), validated on an FPGA, has been found to achieve reasonable performance with a low resource utilization, with some trade-off with respect to hardware throughput and signal representation.

Download Full-text

DecGPU: distributed error correction on massively parallel graphics processing units using CUDA and MPI

BMC Bioinformatics ◽

10.1186/1471-2105-12-85 ◽

2011 ◽

Vol 12 (1) ◽

Cited By ~ 36

Author(s):

Yongchao Liu ◽

Bertil Schmidt ◽

Douglas L Maskell

Keyword(s):

Error Correction ◽

Graphics Processing Units ◽

Massively Parallel ◽

Parallel Graphics ◽

Graphics Processing

Download Full-text

High performance computing on graphics processing units

Pollack Periodica ◽

10.1556/pollack.3.2008.2.3 ◽

2008 ◽

Vol 3 (2) ◽

pp. 27-34 ◽

Cited By ~ 2

Author(s):

Balázs Tukora ◽

Tibor Szalay

Keyword(s):

High Performance Computing ◽

Graphics Processing Units ◽

High Performance ◽

Graphics Processing ◽

Performance Computing

Download Full-text

Parallel Option Pricing with Fourier Space Time-Stepping Method on Graphics Processing Units

SSRN Electronic Journal ◽

10.2139/ssrn.1020207 ◽

2007 ◽

Cited By ~ 1

Author(s):

Vladimir Surkov

Keyword(s):

Option Pricing ◽

Graphics Processing Units ◽

Space Time ◽

Fourier Space ◽

Time Stepping ◽

Graphics Processing

Download Full-text

Improving the Efficiency and the Accuracy of 2D Gel Electrophoresis Spot Detection Using Graphics Processing Units

Current Bioinformatics ◽

10.2174/1574893612666170725141905 ◽

2018 ◽

Vol 13 (2) ◽

pp. 193-206 ◽

Cited By ~ 1

Author(s):

Marwa K. Elteir ◽

Shaheera A. Rashwan ◽

Ashraf A. Khalil

Keyword(s):

Gel Electrophoresis ◽

Graphics Processing Units ◽

2D Gel Electrophoresis ◽

Spot Detection ◽

2D Gel ◽

Graphics Processing

Download Full-text

Using graphics processing units on the cloud to accelerate and reduce processing cost of parameters estimation of seismic processing algorithm

10.22564/16cisbgf2019.221 ◽

2019 ◽

Author(s):

Nicholas Okita ◽

Tiago Coimbra ◽

José Ribeiro ◽

Martin Tygel

Keyword(s):

Graphics Processing Units ◽

Parameters Estimation ◽

Processing Algorithm ◽

Seismic Processing ◽

Processing Cost ◽

Graphics Processing

Download Full-text

Review of smoothed particle hydrodynamics: towards converged Lagrangian flow modelling

Proceedings of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rspa.2019.0801 ◽

2020 ◽

Vol 476 (2241) ◽

pp. 20190801

Author(s):

Steven J. Lind ◽

Benedict D. Rogers ◽

Peter K. Stansby

Keyword(s):

Smoothed Particle Hydrodynamics ◽

Graphics Processing Units ◽

Wave Structure ◽

Free Form ◽

Mesh Free ◽

Weakly Compressible ◽

Particle Hydrodynamics ◽

Massively Parallel Computing ◽

Smoothed Particle ◽

Graphics Processing

This paper presents a review of the progress of smoothed particle hydrodynamics (SPH) towards high-order converged simulations. As a mesh-free Lagrangian method suitable for complex flows with interfaces and multiple phases, SPH has developed considerably in the past decade. While original applications were in astrophysics, early engineering applications showed the versatility and robustness of the method without emphasis on accuracy and convergence. The early method was of weakly compressible form resulting in noisy pressures due to spurious pressure waves. This was effectively removed in the incompressible (divergence-free) form which followed; since then the weakly compressible form has been advanced, reducing pressure noise. Now numerical convergence studies are standard. While the method is computationally demanding on conventional processors, it is well suited to parallel processing on massively parallel computing and graphics processing units. Applications are diverse and encompass wave–structure interaction, geophysical flows due to landslides, nuclear sludge flows, welding, gearbox flows and many others. In the state of the art, convergence is typically between the first- and second-order theoretical limits. Recent advances are improving convergence to fourth order (and higher) and these will also be outlined. This can be necessary to resolve multi-scale aspects of turbulent flow.

Download Full-text

Adaptive Precision Block-Jacobi for High Performance Preconditioning in the Ginkgo Linear Algebra Software

ACM Transactions on Mathematical Software ◽

10.1145/3441850 ◽

2021 ◽

Vol 47 (2) ◽

pp. 1-28

Author(s):

Goran Flegar ◽

Hartwig Anzt ◽

Terry Cojean ◽

Enrique S. Quintana-Ortí

Keyword(s):

Linear Algebra ◽

Graphics Processing Units ◽

High Performance ◽

Numerical Algorithms ◽

Mixed Precision ◽

Before And After ◽

Memory Accesses ◽

Specialized Hardware ◽

The Individual ◽

Graphics Processing

The use of mixed precision in numerical algorithms is a promising strategy for accelerating scientific applications. In particular, the adoption of specialized hardware and data formats for low-precision arithmetic in high-end GPUs (graphics processing units) has motivated numerous efforts aiming at carefully reducing the working precision in order to speed up the computations. For algorithms whose performance is bound by the memory bandwidth, the idea of compressing its data before (and after) memory accesses has received considerable attention. One idea is to store an approximate operator–like a preconditioner–in lower than working precision hopefully without impacting the algorithm output. We realize the first high-performance implementation of an adaptive precision block-Jacobi preconditioner which selects the precision format used to store the preconditioner data on-the-fly, taking into account the numerical properties of the individual preconditioner blocks. We implement the adaptive block-Jacobi preconditioner as production-ready functionality in the Ginkgo linear algebra library, considering not only the precision formats that are part of the IEEE standard, but also customized formats which optimize the length of the exponent and significand to the characteristics of the preconditioner blocks. Experiments run on a state-of-the-art GPU accelerator show that our implementation offers attractive runtime savings.

Download Full-text

DSPSR: Digital Signal Processing Software for Pulsar Astronomy

Publications of the Astronomical Society of Australia ◽

10.1071/as10021 ◽

2011 ◽

Vol 28 (1) ◽

pp. 1-14 ◽

Cited By ~ 172

Author(s):

W. van Straten ◽

M. Bailes

Keyword(s):

Signal Processing ◽

Digital Signal Processing ◽

Graphics Processing Units ◽

High Performance ◽

Digital Signal ◽

General Purpose ◽

Design Decisions ◽

Extensive Range ◽

Processing Software ◽

Graphics Processing

Abstractdspsr is a high-performance, open-source, object-oriented, digital signal processing software library and application suite for use in radio pulsar astronomy. Written primarily in C++, the library implements an extensive range of modular algorithms that can optionally exploit both multiple-core processors and general-purpose graphics processing units. After over a decade of research and development, dspsr is now stable and in widespread use in the community. This paper presents a detailed description of its functionality, justification of major design decisions, analysis of phase-coherent dispersion removal algorithms, and demonstration of performance on some contemporary microprocessor architectures.

Download Full-text

Practical acceleration of direct whole-core calculation employing graphics processing units

Progress in Nuclear Energy ◽

10.1016/j.pnucene.2021.103631 ◽

2021 ◽

Vol 133 ◽

pp. 103631

Author(s):

Namjae Choi ◽

Junsu Kang ◽

Han Gyu Lee ◽

Han Gyu Joo

Keyword(s):

Graphics Processing Units ◽

Graphics Processing

Download Full-text