OORS: An object-oriented rewrite system

Gernot Gebhard; Philipp Lucas

doi:10.2298/csis0702002g

OORS: An object-oriented rewrite system

Computer Science and Information Systems ◽

10.2298/csis0702002g ◽

2007 ◽

Vol 4 (2) ◽

pp. 2-26

Author(s):

Gernot Gebhard ◽

Philipp Lucas

Keyword(s):

Code Generation ◽

Graphics Processing Units ◽

Object Oriented ◽

Graphics Hardware ◽

Code Optimization ◽

Target Architecture ◽

Rewrite Rules ◽

Graphics Processing ◽

Traditional Approaches ◽

Rewrite System

Retargeting a compiler?s back end to a new architecture is a time-consuming process. This becomes an evident problem in the area of programmable graphics hardware (graphics processing units, GPUs) or embedded processors, where architectural changes are faster than elsewhere. We propose the object-oriented rewrite system OORS to overcome this problem. Using the OORS language, a compiler developer can express the code generation and optimization phase in terms of cost-annotated rewrite rules supporting complex non-linearmatching and replacing patterns. Retargetability is achieved by organizing rules into profiles, one for each supported target architecture. Featuring a rule and profile inheritance mechanism, OORS makes the reuse of existing specifications possible. This is an improvement regarding traditional approaches. Altogether OORS increases the maintainability of the compiler?s back end and thus both decreases the complexity and reduces the effort of the retargeting process. To show the potential of this approach, we have implemented a code generation and a code optimization pattern matcher supporting different target architectures using the OORS language and introduced them in a compiler of a programming language for CPUs and GPUs.

Download Full-text

Brian2GeNN: a system for accelerating a large variety of spiking neural networks with graphics hardware

10.1101/448050 ◽

2018 ◽

Cited By ~ 4

Author(s):

Marcel Stimberg ◽

Dan F. M. Goodman ◽

Thomas Nowotny

Keyword(s):

Neural Networks ◽

Code Generation ◽

Graphics Processing Units ◽

High Performance ◽

Spiking Neural Networks ◽

Graphics Hardware ◽

Performance Grade ◽

Network Simulations ◽

Nvidia Gpu ◽

Graphics Processing

“Brian” is a popular Python-based simulator for spiking neural networks, commonly used in computational neuroscience. GeNN is a C++-based meta-compiler for accelerating spiking neural network simulations using consumer or high performance grade graphics processing units (GPUs). Here we introduce a new software package, Brian2GeNN, that connects the two systems so that users can make use of GeNN GPU acceleration when developing their models in Brian, without requiring any technical knowledge about GPUs, C++ or GeNN. The new Brian2GeNN software uses a pipeline of code generation to translate Brian scripts into C++ code that can be used as input to GeNN, and subsequently can be run on suitable NVIDIA GPU accelerators. From the user’s perspective, the entire pipeline is invoked by adding two simple lines to their Brian scripts. We have shown that using Brian2GeNN, typical models can run tens to hundreds of times faster than on CPU.

Download Full-text

Developing Extensible Lattice-Boltzmann Simulators for General-Purpose Graphics-Processing Units

Communications in Computational Physics ◽

10.4208/cicp.351011.260112s ◽

2013 ◽

Vol 13 (3) ◽

pp. 867-879 ◽

Cited By ~ 6

Author(s):

Stuart D. C. Walsh ◽

Martin O. Saar

Keyword(s):

Code Generation ◽

Lattice Boltzmann ◽

Graphics Processing Units ◽

Parallel Implementation ◽

General Purpose ◽

Lattice Boltzmann Simulation ◽

Lattice Boltzmann Simulations ◽

Gpu Architectures ◽

Automatic Code ◽

Graphics Processing

AbstractLattice-Boltzmann methods are versatile numerical modeling techniques capable of reproducing a wide variety of fluid-mechanical behavior. These methods are well suited to parallel implementation, particularly on the single-instruction multiple data (SIMD) parallel processing environments found in computer graphics processing units (GPUs).Although recent programming tools dramatically improve the ease with which GPUbased applications can be written, the programming environment still lacks the flexibility available to more traditional CPU programs. In particular, it may be difficult to develop modular and extensible programs that require variable on-device functionality with current GPU architectures.This paper describes a process of automatic code generation that overcomes these difficulties for lattice-Boltzmann simulations. It details the development of GPU-based modules for an extensible lattice-Boltzmann simulation package – LBHydra. The performance of the automatically generated code is compared to equivalent purposewritten codes for both single-phase,multiphase, andmulticomponent flows. The flexibility of the new method is demonstrated by simulating a rising, dissolving droplet moving through a porous medium with user generated lattice-Boltzmann models and subroutines.

Download Full-text

Method for Adaptation of Algorithms to GPU Architecture

10.20948/graphicon-2021-3027-930-941 ◽

2021 ◽

Author(s):

Vadim Bulavintsev ◽

Dmitry Zhdanov

Keyword(s):

Graphics Processing Units ◽

Search Algorithm ◽

Boolean Satisfiability ◽

Control Flow ◽

Code Optimization ◽

Search Performance ◽

Backtracking Search ◽

Boolean Satisfiability Problem ◽

Graphics Processing ◽

Gpu Architecture

We propose a generalized method for adapting and optimizing algorithms for efficient execution on modern graphics processing units (GPU). The method consists of several steps. First, build a control flow graph (CFG) of the algorithm. Next, transform the CFG into a tree of loops and merge non-parallelizable loops into parallelizable ones. Finally, map the resulting loops tree to the tree of GPU computational units, unrolling the algorithm’s loops as necessary for the match. The mapping should be performed bottom-up, from the lowest GPU architecture levels to the highest ones, to minimize off-chip memory access and maximize register file usage. The method provides programmer with a convenient and robust mental framework and strategy for GPU code optimization. We demonstrate the method by adapting to a GPU the DPLL backtracking search algorithm for solving the Boolean satisfiability problem (SAT). The resulting GPU version of DPLL outperforms the CPU version in raw tree search performance sixfold for regular Boolean satisfiability problems and twofold for irregular ones.

Download Full-text

CU++: an object oriented framework for computational fluid dynamics applications using graphics processing units

The Journal of Supercomputing ◽

10.1007/s11227-013-0985-9 ◽

2013 ◽

Vol 67 (1) ◽

pp. 47-68 ◽

Cited By ~ 6

Author(s):

Dominic D. J. Chandar ◽

Jayanarayanan Sitaraman ◽

Dimitri Mavriplis

Keyword(s):

Fluid Dynamics ◽

Computational Fluid Dynamics ◽

Graphics Processing Units ◽

Object Oriented ◽

Graphics Processing

Download Full-text

Finite-Difference Micromagnetic Solvers With the Object-Oriented Micromagnetic Framework on Graphics Processing Units

IEEE Transactions on Magnetics ◽

10.1109/tmag.2015.2503262 ◽

2016 ◽

Vol 52 (4) ◽

pp. 1-9 ◽

Cited By ~ 9

Author(s):

Sidi Fu ◽

Weilong Cui ◽

Matthew Hu ◽

Ruinan Chang ◽

Michael J. Donahue ◽

...

Keyword(s):

Finite Difference ◽

Graphics Processing Units ◽

Object Oriented ◽

Graphics Processing

Download Full-text

High performance computing on graphics processing units

Pollack Periodica ◽

10.1556/pollack.3.2008.2.3 ◽

2008 ◽

Vol 3 (2) ◽

pp. 27-34 ◽

Cited By ~ 2

Author(s):

Balázs Tukora ◽

Tibor Szalay

Keyword(s):

High Performance Computing ◽

Graphics Processing Units ◽

High Performance ◽

Graphics Processing ◽

Performance Computing

Download Full-text

Parallel Option Pricing with Fourier Space Time-Stepping Method on Graphics Processing Units

SSRN Electronic Journal ◽

10.2139/ssrn.1020207 ◽

2007 ◽

Cited By ~ 1

Author(s):

Vladimir Surkov

Keyword(s):

Option Pricing ◽

Graphics Processing Units ◽

Space Time ◽

Fourier Space ◽

Time Stepping ◽

Graphics Processing

Download Full-text

Improving the Efficiency and the Accuracy of 2D Gel Electrophoresis Spot Detection Using Graphics Processing Units

Current Bioinformatics ◽

10.2174/1574893612666170725141905 ◽

2018 ◽

Vol 13 (2) ◽

pp. 193-206 ◽

Cited By ~ 1

Author(s):

Marwa K. Elteir ◽

Shaheera A. Rashwan ◽

Ashraf A. Khalil

Keyword(s):

Gel Electrophoresis ◽

Graphics Processing Units ◽

2D Gel Electrophoresis ◽

Spot Detection ◽

2D Gel ◽

Graphics Processing

Download Full-text

Using graphics processing units on the cloud to accelerate and reduce processing cost of parameters estimation of seismic processing algorithm

10.22564/16cisbgf2019.221 ◽

2019 ◽

Author(s):

Nicholas Okita ◽

Tiago Coimbra ◽

José Ribeiro ◽

Martin Tygel

Keyword(s):

Graphics Processing Units ◽

Parameters Estimation ◽

Processing Algorithm ◽

Seismic Processing ◽

Processing Cost ◽

Graphics Processing

Download Full-text

Review of smoothed particle hydrodynamics: towards converged Lagrangian flow modelling

Proceedings of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rspa.2019.0801 ◽

2020 ◽

Vol 476 (2241) ◽

pp. 20190801

Author(s):

Steven J. Lind ◽

Benedict D. Rogers ◽

Peter K. Stansby

Keyword(s):

Smoothed Particle Hydrodynamics ◽

Graphics Processing Units ◽

Wave Structure ◽

Free Form ◽

Mesh Free ◽

Weakly Compressible ◽

Particle Hydrodynamics ◽

Massively Parallel Computing ◽

Smoothed Particle ◽

Graphics Processing

This paper presents a review of the progress of smoothed particle hydrodynamics (SPH) towards high-order converged simulations. As a mesh-free Lagrangian method suitable for complex flows with interfaces and multiple phases, SPH has developed considerably in the past decade. While original applications were in astrophysics, early engineering applications showed the versatility and robustness of the method without emphasis on accuracy and convergence. The early method was of weakly compressible form resulting in noisy pressures due to spurious pressure waves. This was effectively removed in the incompressible (divergence-free) form which followed; since then the weakly compressible form has been advanced, reducing pressure noise. Now numerical convergence studies are standard. While the method is computationally demanding on conventional processors, it is well suited to parallel processing on massively parallel computing and graphics processing units. Applications are diverse and encompass wave–structure interaction, geophysical flows due to landslides, nuclear sludge flows, welding, gearbox flows and many others. In the state of the art, convergence is typically between the first- and second-order theoretical limits. Recent advances are improving convergence to fourth order (and higher) and these will also be outlined. This can be necessary to resolve multi-scale aspects of turbulent flow.

Download Full-text