Design and Development of Texture Filtering Architecture for GPU Application Using Reconfigurable Computing

Graphical Processing Units (GPUs) have become an integral part of today’s mainstream computing systems. They are also being used as reprogrammable General Purpose GPUs (GP-GPUs) to perform complex scientific computations. Reconfigurability is an attractive approach to embedded systems allowing hardware level modification. Hence, there is a high demand for GPU designs based on reconfigurable hardware. The texture filter unit is designed to process geometric data like vertices and convert these into pixels on the screen. This process involves number of operations, like circle and cube generation, rotator, and scaling. The texture filter unit is designed with all necessary hardware to deal with all the different filtering operations. The designed texture filtering units are modelled in Verilog on Altera Quartus II and simulated using ModelSim tools. The functionality of the modelled blocks is verified using test inputs in the simulator.Circle and cube coordinates are generated for circle and cube generation. The work can form the basis for designing a complete reconfigurable GPU.

Download Full-text

Efficient Graph Component Labeling on Hybrid CPU and GPU Platforms

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.596.276 ◽

2014 ◽

Vol 596 ◽

pp. 276-279

Author(s):

Xiao Hui Pan

Keyword(s):

High Performance ◽

General Purpose ◽

Gpu Programming ◽

Data Parallel ◽

Graphical Processing Units ◽

Architectural Features ◽

Graph Coloring Problem ◽

Graphical Processing ◽

And Performance ◽

Performance Results

Graph component labeling, which is a subset of the general graph coloring problem, is a computationally expensive operation in many important applications and simulations. A number of data-parallel algorithmic variations to the component labeling problem are possible and we explore their use with general purpose graphical processing units (GPGPUs) and with the CUDA GPU programming language. We discuss implementation issues and performance results on CPUs and GPUs using CUDA. We evaluated our system with real-world graphs. We show how to consider different architectural features of the GPU and the host CPUs and achieve high performance.

Download Full-text

High-Performance Reconfigurable Computing

Advances in Computer and Electrical Engineering - Advanced Methodologies and Technologies in Network Architecture, Mobile Computing, and Data Analytics ◽

10.4018/978-1-5225-7598-6.ch053 ◽

2019 ◽

pp. 731-744

Author(s):

Mário Pereira Vestias

Keyword(s):

Power Consumption ◽

Integrated Circuit ◽

Reconfigurable Computing ◽

High Performance ◽

General Purpose ◽

Reconfigurable Hardware ◽

Coarse Grained ◽

Lower Power ◽

Fine Grained ◽

Application Specific

High-performance reconfigurable computing systems integrate reconfigurable technology in the computing architecture to improve performance. Besides performance, reconfigurable hardware devices also achieve lower power consumption compared to general-purpose processors. Better performance and lower power consumption could be achieved using application-specific integrated circuit (ASIC) technology. However, ASICs are not reconfigurable, turning them application specific. Reconfigurable logic becomes a major advantage when hardware flexibility permits to speed up whatever the application with the same hardware module. The first and most common devices utilized for reconfigurable computing are fine-grained FPGAs with a large hardware flexibility. To reduce the performance and area overhead associated with the reconfigurability, coarse-grained reconfigurable solutions has been proposed as a way to achieve better performance and lower power consumption. In this chapter, the authors provide a description of reconfigurable hardware for high-performance computing.

Download Full-text

High-Performance Reconfigurable Computing

Encyclopedia of Information Science and Technology, Fourth Edition ◽

10.4018/978-1-5225-2255-3.ch348 ◽

2018 ◽

pp. 4018-4029

Author(s):

Mário Pereira Vestias

Keyword(s):

Power Consumption ◽

Integrated Circuit ◽

Reconfigurable Computing ◽

High Performance ◽

General Purpose ◽

Reconfigurable Hardware ◽

Coarse Grained ◽

Lower Power ◽

Fine Grained ◽

Application Specific

High-Performance Reconfigurable Computing systems integrate reconfigurable technology in the computing architecture to improve performance. Besides performance, reconfigurable hardware devices also achieve lower power consumption compared to General-Purpose Processors. Better performance and lower power consumption could be achieved using Application Specific Integrated Circuit (ASIC) technology. However, ASICs are not reconfigurable, turning them application specific. Reconfigurable logic becomes a major advantage when hardware flexibility permits to speed up whatever the application with the same hardware module. The first and most common devices utilized for reconfigurable computing are fine-grained FPGAs with a large hardware flexibility. To reduce the performance and area overhead associated with the reconfigurability, coarse-grained reconfigurable solutions has been proposed as a way to achieve better performance and lower power consumption. In this chapter we will provide a description of reconfigurable hardware for high performance computing.

Download Full-text

Design and Development of Stream Processor Architecture for GPU Application Using Reconfigurable Computing

International Journal of Reconfigurable and Embedded Systems (IJRES) ◽

10.11591/ijres.v2.i1.pp1-14 ◽

2013 ◽

Vol 2 (1) ◽

pp. 1

Author(s):

Sanket Dessai ◽

Krishna Bhushan Vutukuru

Keyword(s):

Reconfigurable Computing ◽

General Purpose ◽

Instruction Level Parallelism ◽

Stream Processor ◽

Subword Parallelism ◽

Host Processor ◽

Graphical Processing ◽

Processor Unit ◽

Pipelined Multiplier ◽

Level Parallelism

Graphical Processing Units (GPUs) have become an integral part of today’s mainstream computing systems. They are also being used as reprogrammable General Purpose GPUs (GP-GPUs) to perform complex scientific computations. Reconfigurability is an attractive approach to embedded systems allowing hardware level modification. Hence, there is a high demand for GPU designs based on reconfigurable hardware. Stream processor consists of clusters of functional units which provide a bandwidth hierarchy, supporting hundreds of arithmetic units. The arithmetic cluster units are designed to exploit instruction level parallelism and subword parallelism within a cluster and data parallelism across the clusters.For decreasing the area and power, a single controller is used to control data flow between clusters and between host processor and GPU. The designed of stream processor unit has been carried out in Verilog on Altera Quartus II and simulated using ModelSim tools. The functionality of the modelled blocks is verified using test inputs in the simulator.The simulated execution time of 8-bit pipelined multiplier is 60 ps and 100 ns for 8-bit pipelined adder while operating at 90 MHz.

Download Full-text

SIMinG-1k: A thousand-core simulator running on general-purpose graphical processing units

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.2940 ◽

2012 ◽

Vol 25 (10) ◽

pp. 1443-1461 ◽

Cited By ~ 2

Author(s):

Shivani Raghav ◽

Andrea Marongiu ◽

Christian Pinto ◽

Martino Ruggiero ◽

David Atienza ◽

...

Keyword(s):

General Purpose ◽

Graphical Processing Units ◽

Graphical Processing

Download Full-text

In Situ Power Analysis of General Purpose Graphical Processing Units

2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing ◽

10.1109/pdp.2011.67 ◽

2011 ◽

Cited By ~ 4

Author(s):

M.Z. Shaikh ◽

M. Gregoire ◽

W. Li ◽

M. Wroblewski ◽

S. Simon

Keyword(s):

Power Analysis ◽

General Purpose ◽

Graphical Processing Units ◽

Graphical Processing

Download Full-text

The potential of graphical processing units to solve hydraulic network equations

Journal of Hydroinformatics ◽

10.2166/hydro.2011.023 ◽

2011 ◽

Vol 14 (3) ◽

pp. 603-612 ◽

Cited By ~ 8

Author(s):

P. A. Crous ◽

J. E. van Zyl ◽

Y. Roodt

Keyword(s):

Conjugate Gradient ◽

General Purpose ◽

Gradient Algorithm ◽

Processing Unit ◽

Distribution Models ◽

Data Set ◽

Central Processing ◽

Graphical Processing Units ◽

Hydraulic Network ◽

Graphical Processing

The Engineering discipline has relied on computers to perform numerical calculations in many of its sub-disciplines over the last decades. The advent of graphical processing units (GPUs), parallel stream processors, has the potential to speed up generic simulations that facilitate engineering applications aside from traditional computer graphics applications, using GPGPU (general purpose programming on the GPU). The potential benefits of exploiting the GPU for general purpose computation require the program to be highly arithmetic intensive and also data independent. This paper looks at the specific application of the Conjugate Gradient method used in hydraulic network solvers on the GPU and compares the results to conventional central processing unit (CPU) implementations. The results indicate that the GPU becomes more efficient as the data set size increases. However, with the current hardware and the implementation of the Conjugate Gradient algorithm, the application of stream processing to hydraulic network solvers is only faster and more efficient for exceptionally large water distribution models, which are seldom found in practice.

Download Full-text

Comparative analysis of software optimization methods in context of branch predication on GPUs

Российский технологический журнал ◽

10.32362/2500-316x-2021-9-6-7-15 ◽

2021 ◽

Vol 9 (6) ◽

pp. 7-15

Author(s):

I. Yu. Sesin ◽

R. G. Bolbakov

Keyword(s):

Optimization Methods ◽

Time Algorithm ◽

General Purpose ◽

Speculative Execution ◽

Adaptive Optimization ◽

Software Optimization ◽

Performance Loss ◽

Graphical Processing Units ◽

Parallel Data ◽

Graphical Processing

General Purpose computing for Graphical Processing Units (GPGPU) technology is a powerful tool for offloading parallel data processing tasks to Graphical Processing Units (GPUs). This technology finds its use in variety of domains – from science and commerce to hobbyists. GPU-run general-purpose programs will inevitably run into performance issues stemming from code branch predication. Code predication is a GPU feature that makes both conditional branches execute, masking the results of incorrect branch. This leads to considerable performance losses for GPU programs that have large amounts of code hidden away behind conditional operators. This paper focuses on the analysis of existing approaches to improving software performance in the context of relieving the aforementioned performance loss. Description of said approaches is provided, along with their upsides, downsides and extents of their applicability and whether they address the outlined problem. Covered approaches include: optimizing compilers, JIT-compilation, branch predictor, speculative execution, adaptive optimization, run-time algorithm specialization, profile-guided optimization. It is shown that the aforementioned methods are mostly catered to CPU-specific issues and are generally not applicable, as far as branch-predication performance loss is concerned. Lastly, we outline the need for a separate performance improving approach, addressing specifics of branch predication and GPGPU workflow.

Download Full-text

Exploring the Future of Out-of-Core Computing with Compute-Local Non-Volatile Memory

Scientific Programming ◽

10.1155/2014/303810 ◽

2014 ◽

Vol 22 (2) ◽

pp. 125-139 ◽

Cited By ~ 1

Author(s):

Myoungsoo Jung ◽

Ellis H. Wilson ◽

Wonil Choi ◽

John Shalf ◽

Hasan Metin Aktulga ◽

...

Keyword(s):

High Performance ◽

File Systems ◽

Network Capacity ◽

General Purpose ◽

Graphical Processing Units ◽

Non Volatile Memory ◽

Order Of Magnitude ◽

Volatile Memory ◽

Graphical Processing ◽

Point To Point

Drawing parallels to the rise of general purpose graphical processing units (GPGPUs) as accelerators for specific high-performance computing (HPC) workloads, there is a rise in the use of non-volatile memory (NVM) as accelerators for I/O-intensive scientific applications. However, existing works have explored use of NVM within dedicated I/O nodes, which are distant from the compute nodes that actually need such acceleration. As NVM bandwidth begins to out-pace point-to-point network capacity, we argue for the need to break from the archetype of completely separated storage. Therefore, in this work we investigate co-location of NVM and compute by varying I/O interfaces, file systems, types of NVM, and both current and future SSD architectures, uncovering numerous bottlenecks implicit in these various levels in the I/O stack. We present novel hardware and software solutions, including the new Unified File System (UFS), to enable fuller utilization of the new compute-local NVM storage. Our experimental evaluation, which employs a real-world Out-of-Core (OoC) HPC application, demonstrates throughput increases in excess of an order of magnitude over current approaches.

Download Full-text

Discrete-event Execution Alternatives on General Purpose Graphical Processing Units (GPGPUs)

20th Workshop on Principles of Advanced and Distributed Simulation (PADS'06) ◽

10.1109/pads.2006.15 ◽

2006 ◽

Cited By ~ 32

Author(s):

K.S. Perumalla

Keyword(s):

Discrete Event ◽

General Purpose ◽

Graphical Processing Units ◽

Graphical Processing

Download Full-text