Discrete-event Execution Alternatives on General Purpose Graphical Processing Units (GPGPUs)

Graph component labeling, which is a subset of the general graph coloring problem, is a computationally expensive operation in many important applications and simulations. A number of data-parallel algorithmic variations to the component labeling problem are possible and we explore their use with general purpose graphical processing units (GPGPUs) and with the CUDA GPU programming language. We discuss implementation issues and performance results on CPUs and GPUs using CUDA. We evaluated our system with real-world graphs. We show how to consider different architectural features of the GPU and the host CPUs and achieve high performance.

Download Full-text

SIMinG-1k: A thousand-core simulator running on general-purpose graphical processing units

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.2940 ◽

2012 ◽

Vol 25 (10) ◽

pp. 1443-1461 ◽

Cited By ~ 2

Author(s):

Shivani Raghav ◽

Andrea Marongiu ◽

Christian Pinto ◽

Martino Ruggiero ◽

David Atienza ◽

...

Keyword(s):

General Purpose ◽

Graphical Processing Units ◽

Graphical Processing

Download Full-text

In Situ Power Analysis of General Purpose Graphical Processing Units

2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing ◽

10.1109/pdp.2011.67 ◽

2011 ◽

Cited By ~ 4

Author(s):

M.Z. Shaikh ◽

M. Gregoire ◽

W. Li ◽

M. Wroblewski ◽

S. Simon

Keyword(s):

Power Analysis ◽

General Purpose ◽

Graphical Processing Units ◽

Graphical Processing

Download Full-text

The potential of graphical processing units to solve hydraulic network equations

Journal of Hydroinformatics ◽

10.2166/hydro.2011.023 ◽

2011 ◽

Vol 14 (3) ◽

pp. 603-612 ◽

Cited By ~ 8

Author(s):

P. A. Crous ◽

J. E. van Zyl ◽

Y. Roodt

Keyword(s):

Conjugate Gradient ◽

General Purpose ◽

Gradient Algorithm ◽

Processing Unit ◽

Distribution Models ◽

Data Set ◽

Central Processing ◽

Graphical Processing Units ◽

Hydraulic Network ◽

Graphical Processing

The Engineering discipline has relied on computers to perform numerical calculations in many of its sub-disciplines over the last decades. The advent of graphical processing units (GPUs), parallel stream processors, has the potential to speed up generic simulations that facilitate engineering applications aside from traditional computer graphics applications, using GPGPU (general purpose programming on the GPU). The potential benefits of exploiting the GPU for general purpose computation require the program to be highly arithmetic intensive and also data independent. This paper looks at the specific application of the Conjugate Gradient method used in hydraulic network solvers on the GPU and compares the results to conventional central processing unit (CPU) implementations. The results indicate that the GPU becomes more efficient as the data set size increases. However, with the current hardware and the implementation of the Conjugate Gradient algorithm, the application of stream processing to hydraulic network solvers is only faster and more efficient for exceptionally large water distribution models, which are seldom found in practice.

Download Full-text

Comparative analysis of software optimization methods in context of branch predication on GPUs

Российский технологический журнал ◽

10.32362/2500-316x-2021-9-6-7-15 ◽

2021 ◽

Vol 9 (6) ◽

pp. 7-15

Author(s):

I. Yu. Sesin ◽

R. G. Bolbakov

Keyword(s):

Optimization Methods ◽

Time Algorithm ◽

General Purpose ◽

Speculative Execution ◽

Adaptive Optimization ◽

Software Optimization ◽

Performance Loss ◽

Graphical Processing Units ◽

Parallel Data ◽

Graphical Processing

General Purpose computing for Graphical Processing Units (GPGPU) technology is a powerful tool for offloading parallel data processing tasks to Graphical Processing Units (GPUs). This technology finds its use in variety of domains – from science and commerce to hobbyists. GPU-run general-purpose programs will inevitably run into performance issues stemming from code branch predication. Code predication is a GPU feature that makes both conditional branches execute, masking the results of incorrect branch. This leads to considerable performance losses for GPU programs that have large amounts of code hidden away behind conditional operators. This paper focuses on the analysis of existing approaches to improving software performance in the context of relieving the aforementioned performance loss. Description of said approaches is provided, along with their upsides, downsides and extents of their applicability and whether they address the outlined problem. Covered approaches include: optimizing compilers, JIT-compilation, branch predictor, speculative execution, adaptive optimization, run-time algorithm specialization, profile-guided optimization. It is shown that the aforementioned methods are mostly catered to CPU-specific issues and are generally not applicable, as far as branch-predication performance loss is concerned. Lastly, we outline the need for a separate performance improving approach, addressing specifics of branch predication and GPGPU workflow.

Download Full-text

Exploring the Future of Out-of-Core Computing with Compute-Local Non-Volatile Memory

Scientific Programming ◽

10.1155/2014/303810 ◽

2014 ◽

Vol 22 (2) ◽

pp. 125-139 ◽

Cited By ~ 1

Author(s):

Myoungsoo Jung ◽

Ellis H. Wilson ◽

Wonil Choi ◽

John Shalf ◽

Hasan Metin Aktulga ◽

...

Keyword(s):

High Performance ◽

File Systems ◽

Network Capacity ◽

General Purpose ◽

Graphical Processing Units ◽

Non Volatile Memory ◽

Order Of Magnitude ◽

Volatile Memory ◽

Graphical Processing ◽

Point To Point

Drawing parallels to the rise of general purpose graphical processing units (GPGPUs) as accelerators for specific high-performance computing (HPC) workloads, there is a rise in the use of non-volatile memory (NVM) as accelerators for I/O-intensive scientific applications. However, existing works have explored use of NVM within dedicated I/O nodes, which are distant from the compute nodes that actually need such acceleration. As NVM bandwidth begins to out-pace point-to-point network capacity, we argue for the need to break from the archetype of completely separated storage. Therefore, in this work we investigate co-location of NVM and compute by varying I/O interfaces, file systems, types of NVM, and both current and future SSD architectures, uncovering numerous bottlenecks implicit in these various levels in the I/O stack. We present novel hardware and software solutions, including the new Unified File System (UFS), to enable fuller utilization of the new compute-local NVM storage. Our experimental evaluation, which employs a real-world Out-of-Core (OoC) HPC application, demonstrates throughput increases in excess of an order of magnitude over current approaches.

Download Full-text

Color and motion-based particle filter target tracking in a network of overlapping cameras with multi-threading and GPGPU

Acta Universitaria ◽

10.15174/au.2013.355 ◽

2013 ◽

Vol 23 (1) ◽

pp. 9-16

Author(s):

Jorge Francisco Madrigal Díaz ◽

Jean-Bernard Hayet

Keyword(s):

Target Tracking ◽

Particle Filters ◽

Target Position ◽

General Purpose ◽

Video Sequences ◽

Central Processing ◽

Graphical Processing Units ◽

Independent Particle ◽

Likelihood Model ◽

Graphical Processing

This paper describes an efficient implementation of multiple-target multiple-view tracking in video-surveillance sequences. It takes advantage of the capabilities of multiple core Central Processing Units (CPUs) and of graphical processing units under the Compute Unifie Device Arquitecture (CUDA) framework. The principle of our algorithm is 1) in each video sequence, to perform tracking on all persons to track by independent particle filters and 2) to fuse the tracking results of all sequences. Particle filters belong to the category of recursive Bayesian filters. They update a Monte-Carlo representation of the posterior distribution over the target position and velocity. For this purpose, they combine a probabilistic motion model, i.e. prior knowledge about how targets move (e.g. constant velocity) and a likelihood model associated to the observations on targets. At this first level of single video sequences, the multi-threading library Threading Buildings Blocks (TBB) has been used to parallelize the processing of the per-target independent particle filters. Afterwards at the higher level, we rely on General Purpose Programming on Graphical Processing Units (generally termed as GPGPU) through CUDA in order to fuse target-tracking data collected on multiple video sequences, by solving the data association problem. Tracking results are presented on various challenging tracking datasets.

Download Full-text

Zero-skipping in CapsNet. Is it worth it?

10.29007/cd8h ◽

2020 ◽

Author(s):

Ramin Sharifi ◽

Pouya Shiri ◽

Amirali Baniasadi

Keyword(s):

Neural Networks ◽

Energy Consumption ◽

Complex Networks ◽

General Purpose ◽

Graphical Processing Units ◽

Different Types ◽

Graphical Processing ◽

Save Energy ◽

Time And Energy

Capsule networks (CapsNet) are the next generation of neural networks. CapsNet can be used for classification of data of different types. Today’s General Purpose Graphical Processing Units (GPGPUs) are more capable than before and let us train these complex networks. However, time and energy consumption remains a challenge. In this work, we investigate if skipping trivial operations i.e. multiplication by zero in CapsNet, can possibly save energy. We base our analysis on the number of multiplications by zero detected while training CapsNet on MNIST and Fashion- MNIST datasets.

Download Full-text

Massively parallel landscape-evolution modelling using general purpose graphical processing units

2012 19th International Conference on High Performance Computing ◽

10.1109/hipc.2012.6507488 ◽

2012 ◽

Cited By ~ 1

Author(s):

A. S. McGough ◽

S. Liang ◽

M. Rapoportas ◽

R. Grey ◽

G. Kumar Vinod ◽

...

Keyword(s):

Landscape Evolution ◽

General Purpose ◽

Massively Parallel ◽

Graphical Processing Units ◽

Graphical Processing

Download Full-text

A Flexible Hybrid BCH Decoder for Modern NAND Flash Memories Using General Purpose Graphical Processing Units (GPGPUs)

Micromachines ◽

10.3390/mi10060365 ◽

2019 ◽

Vol 10 (6) ◽

pp. 365 ◽

Cited By ~ 4

Author(s):

Arul Subbiah ◽

Tokunbo Ogunfunmi

Keyword(s):

Finite Fields ◽

Flash Memory ◽

Search Algorithm ◽

Memory Systems ◽

Block Codes ◽

General Purpose ◽

Bch Codes ◽

Nand Flash ◽

Graphical Processing Units ◽

Graphical Processing

Bose–Chaudhuri–Hocquenghem (BCH) codes are broadly used to correct errors in flash memory systems and digital communications. These codes are cyclic block codes and have their arithmetic fixed over the splitting field of their generator polynomial. There are many solutions proposed using CPUs, hardware, and Graphical Processing Units (GPUs) for the BCH decoders. The performance of these BCH decoders is of ultimate importance for systems involving flash memory. However, it is essential to have a flexible solution to correct multiple bit errors over the different finite fields (GF(2 m )). In this paper, we propose a pragmatic approach to decode BCH codes over the different finite fields using hardware circuits and GPUs in tandem. We propose to employ hardware design for a modified syndrome generator and GPUs for a key-equation solver and an error corrector. Using the above partition, we have shown the ability to support multiple bit errors across different BCH block codes without compromising on the performance. Furthermore, the proposed method to generate modified syndrome has zero latency for scenarios where there are no errors. When there is an error detected, the GPUs are deployed to correct the errors using the iBM and Chien search algorithm. The results have shown that using the modified syndrome approach, we can support different multiple finite fields with high throughput.

Download Full-text