The potential of graphical processing units to solve hydraulic network equations

The Engineering discipline has relied on computers to perform numerical calculations in many of its sub-disciplines over the last decades. The advent of graphical processing units (GPUs), parallel stream processors, has the potential to speed up generic simulations that facilitate engineering applications aside from traditional computer graphics applications, using GPGPU (general purpose programming on the GPU). The potential benefits of exploiting the GPU for general purpose computation require the program to be highly arithmetic intensive and also data independent. This paper looks at the specific application of the Conjugate Gradient method used in hydraulic network solvers on the GPU and compares the results to conventional central processing unit (CPU) implementations. The results indicate that the GPU becomes more efficient as the data set size increases. However, with the current hardware and the implementation of the Conjugate Gradient algorithm, the application of stream processing to hydraulic network solvers is only faster and more efficient for exceptionally large water distribution models, which are seldom found in practice.

Download Full-text

Color and motion-based particle filter target tracking in a network of overlapping cameras with multi-threading and GPGPU

Acta Universitaria ◽

10.15174/au.2013.355 ◽

2013 ◽

Vol 23 (1) ◽

pp. 9-16

Author(s):

Jorge Francisco Madrigal Díaz ◽

Jean-Bernard Hayet

Keyword(s):

Target Tracking ◽

Particle Filters ◽

Target Position ◽

General Purpose ◽

Video Sequences ◽

Central Processing ◽

Graphical Processing Units ◽

Independent Particle ◽

Likelihood Model ◽

Graphical Processing

This paper describes an efficient implementation of multiple-target multiple-view tracking in video-surveillance sequences. It takes advantage of the capabilities of multiple core Central Processing Units (CPUs) and of graphical processing units under the Compute Unifie Device Arquitecture (CUDA) framework. The principle of our algorithm is 1) in each video sequence, to perform tracking on all persons to track by independent particle filters and 2) to fuse the tracking results of all sequences. Particle filters belong to the category of recursive Bayesian filters. They update a Monte-Carlo representation of the posterior distribution over the target position and velocity. For this purpose, they combine a probabilistic motion model, i.e. prior knowledge about how targets move (e.g. constant velocity) and a likelihood model associated to the observations on targets. At this first level of single video sequences, the multi-threading library Threading Buildings Blocks (TBB) has been used to parallelize the processing of the per-target independent particle filters. Afterwards at the higher level, we rely on General Purpose Programming on Graphical Processing Units (generally termed as GPGPU) through CUDA in order to fuse target-tracking data collected on multiple video sequences, by solving the data association problem. Tracking results are presented on various challenging tracking datasets.

Download Full-text

Computational Fluid Dynamics Computations Using a Preconditioned Krylov Solver on Graphical Processing Units

Journal of Fluids Engineering ◽

10.1115/1.4031159 ◽

2015 ◽

Vol 138 (1) ◽

Cited By ~ 1

Author(s):

Amit Amritkar ◽

Danesh Tafti

Keyword(s):

Fluid Dynamics ◽

Computational Fluid Dynamics ◽

Pipe Flow ◽

Turbulent Channel Flow ◽

General Purpose ◽

Processing Unit ◽

Double Precision ◽

Single Precision ◽

Central Processing ◽

Graphical Processing

Graphical processing unit (GPU) computation in recent years has seen extensive growth due to advancement in both hardware and software stack. This has led to increase in the use of GPUs as accelerators across a broad spectrum of applications. This work deals with the use of general purpose GPUs for performing computational fluid dynamics (CFD) computations. The paper discusses strategies and findings on porting a large multifunctional CFD code to the GPU architecture. Within this framework, the most compute intensive segment of the software, the BiCGStab linear solver using additive Schwarz block preconditioners with point Jacobi iterative smoothing is optimized for the GPU platform using various techniques in CUDA Fortran. Representative turbulent channel and pipe flow are investigated for validation and benchmarking purposes. Both single and double precision calculations are highlighted. For a modest single block grid of 64 × 64 × 64, the turbulent channel flow computations showed a speedup of about eightfold in double precision and more than 13-fold for single precision on the NVIDIA Tesla GPU over a serial run on an Intel central processing unit (CPU). For the pipe flow consisting of 1.78 × 106 grid cells distributed over 36 mesh blocks, the gains were more modest at 4.5 and 6.5 for double and single precision, respectively.

Download Full-text

Ecological Impact of Green Computing Using Graphical Processing Units in Molecular Dynamics Simulations

International Journal of Green Computing ◽

10.4018/ijgc.2018010103 ◽

2018 ◽

Vol 9 (1) ◽

pp. 35-48 ◽

Cited By ~ 5

Author(s):

Izabele Marquetti ◽

Jhonatam Rodrigues ◽

Salil S. Desai

Keyword(s):

Molecular Dynamics ◽

Molecular Dynamics Simulations ◽

Ecological Impact ◽

Green Computing ◽

Processing Unit ◽

Central Processing ◽

Computational Performance ◽

Graphical Processing Units ◽

Graphical Processing ◽

Dynamics Simulations

Molecular dynamics (MD) models require comprehensive computational power to simulate nanoscale phenomena. Traditionally, central processing unit (CPU) clusters have been the standard method of performing these numerically intensive computations. This article investigates the use of graphical processing units (GPUs) to implement large-scale MD models for exploring nanofluidic-substrate interactions. MD models of water nanodroplets over flat silicon substrate are tracked wherein the simulation attains a steady state computational performance. Different classes of GPU units from NVIDIA (C2050, K20, and K40) are evaluated for energy efficiency performance with respect to three green computing measures: simulation completion time, power consumption, and CO2 emissions. The CPU+K40 configuration displayed the lowest energy consumption profile for all the measures. This research demonstrates the use of energy efficient graphical computing versus traditional CPU computing for high-performance molecular dynamics simulations.

Download Full-text

A comparison on classical-hybrid conjugate gradient method under exact line search

International Journal of Advances in Intelligent Informatics ◽

10.26555/ijain.v5i2.356 ◽

2019 ◽

Vol 5 (2) ◽

pp. 150

Author(s):

Nur Syarafina Mohamed ◽

Mustafa Mamat ◽

Mohd Rivaie ◽

Shazlyn Milleana Shaharudin

Keyword(s):

Conjugate Gradient Method ◽

Conjugate Gradient ◽

Gradient Method ◽

Line Search ◽

Convex Combination ◽

Processing Unit ◽

Central Processing ◽

Exact Line Search ◽

Hybrid Conjugate Gradient Method ◽

Number Of Iterations

One of the popular approaches in modifying the Conjugate Gradient (CG) Method is hybridization. In this paper, a new hybrid CG is introduced and its performance is compared to the classical CG method which are Rivaie-Mustafa-Ismail-Leong (RMIL) and Syarafina-Mustafa-Rivaie (SMR) methods. The proposed hybrid CG is evaluated as a convex combination of RMIL and SMR method. Their performance are analyzed under the exact line search. The comparison performance showed that the hybrid CG is promising and has outperformed the classical CG of RMIL and SMR in terms of the number of iterations and central processing unit per time.

Download Full-text

Efficient Graph Component Labeling on Hybrid CPU and GPU Platforms

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.596.276 ◽

2014 ◽

Vol 596 ◽

pp. 276-279

Author(s):

Xiao Hui Pan

Keyword(s):

High Performance ◽

General Purpose ◽

Gpu Programming ◽

Data Parallel ◽

Graphical Processing Units ◽

Architectural Features ◽

Graph Coloring Problem ◽

Graphical Processing ◽

And Performance ◽

Performance Results

Graph component labeling, which is a subset of the general graph coloring problem, is a computationally expensive operation in many important applications and simulations. A number of data-parallel algorithmic variations to the component labeling problem are possible and we explore their use with general purpose graphical processing units (GPGPUs) and with the CUDA GPU programming language. We discuss implementation issues and performance results on CPUs and GPUs using CUDA. We evaluated our system with real-world graphs. We show how to consider different architectural features of the GPU and the host CPUs and achieve high performance.

Download Full-text

A Practical Method for the Rational Design of Ship Structures

Journal of Ship Research ◽

10.5957/jsr.1980.24.2.101 ◽

1980 ◽

Vol 24 (02) ◽

pp. 101-113 ◽

Cited By ~ 1

Author(s):

Owen F. Hughes ◽

Farrokh Mistree ◽

Vedran Žanic

Keyword(s):

Optimum Design ◽

Rational Design ◽

Cost Effective ◽

Practical Method ◽

General Purpose ◽

Processing Unit ◽

Ship Structures ◽

Finite Element Program ◽

Central Processing ◽

Element Program

A practical, rationally based method is presented for the automated optimum design of ship structures. The method required the development of (a) a rapid, design-oriented finite-element program for the analysis of ship structures; (b) a comprehensive mathematical model for the evaluation of the capability of the structure; and (c) a cost-effective optimization algorithm for the solution of a large, highly constrained, nonlinear redesign problem. These developments have been incorporated into a program called SHIPOPT. The efficiency and robustness of the method is illustrated by using it to determine the optimum design of a complete cargo hold of a general-purpose cargo ship. The overall dimensions and the design loads are the same as those used in the design of the very successful SD14 series of ships. The redesign problem contains 94 variables, a nonlinear objective function, and over 500 constraints of which approximately half are non-linear. Program SHIPOPT required approximately eight minutes of central processing unit time on a CDC CYBER 171 to determine the optimum design.

Download Full-text

A Representation of Membrane Computing with a Clustering Algorithm on the Graphical Processing Unit

Processes ◽

10.3390/pr8091199 ◽

2020 ◽

Vol 8 (9) ◽

pp. 1199

Author(s):

Ravie Chandren Muniyandi ◽

Ali Maroosi

Keyword(s):

Graphics Processing Units ◽

Clustering Algorithm ◽

Hamiltonian Path ◽

Fold Increase ◽

General Purpose ◽

Processing Unit ◽

Thread Block ◽

Hard Problems ◽

Graphical Processing ◽

Graphics Processing

Long-timescale simulations of biological processes such as photosynthesis or attempts to solve NP-hard problems such as traveling salesman, knapsack, Hamiltonian path, and satisfiability using membrane systems without appropriate parallelization can take hours or days. Graphics processing units (GPU) deliver an immensely parallel mechanism to compute general-purpose computations. Previous studies mapped one membrane to one thread block on GPU. This is disadvantageous given that when the quantity of objects for each membrane is small, the quantity of active thread will also be small, thereby decreasing performance. While each membrane is designated to one thread block, the communication between thread blocks is needed for executing the communication between membranes. Communication between thread blocks is a time-consuming process. Previous approaches have also not addressed the issue of GPU occupancy. This study presents a classification algorithm to manage dependent objects and membranes based on the communication rate associated with the defined weighted network and assign them to sub-matrices. Thus, dependent objects and membranes are allocated to the same threads and thread blocks, thereby decreasing communication between threads and thread blocks and allowing GPUs to maintain the highest occupancy possible. The experimental results indicate that for 48 objects per membrane, the algorithm facilitates a 93-fold increase in processing speed compared to a 1.6-fold increase with previous algorithms.

Download Full-text

Literature Survey on Stereo Vision Disparity Map Algorithms

Journal of Sensors ◽

10.1155/2016/8742920 ◽

2016 ◽

Vol 2016 ◽

pp. 1-23 ◽

Cited By ~ 57

Author(s):

Rostam Affendi Hamzah ◽

Haidi Ibrahim

Keyword(s):

Stereo Vision ◽

Stereo Matching ◽

Literature Survey ◽

Processing Unit ◽

Stereo Correspondence ◽

Disparity Map ◽

Central Processing ◽

Field Programmable ◽

Processing Module ◽

Graphical Processing

This paper presents a literature survey on existing disparity map algorithms. It focuses on four main stages of processing as proposed by Scharstein and Szeliski in a taxonomy and evaluation of dense two-frame stereo correspondence algorithms performed in 2002. To assist future researchers in developing their own stereo matching algorithms, a summary of the existing algorithms developed for every stage of processing is also provided. The survey also notes the implementation of previous software-based and hardware-based algorithms. Generally, the main processing module for a software-based implementation uses only a central processing unit. By contrast, a hardware-based implementation requires one or more additional processors for its processing module, such as graphical processing unit or a field programmable gate array. This literature survey also presents a method of qualitative measurement that is widely used by researchers in the area of stereo vision disparity mappings.

Download Full-text

SIMinG-1k: A thousand-core simulator running on general-purpose graphical processing units

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.2940 ◽

2012 ◽

Vol 25 (10) ◽

pp. 1443-1461 ◽

Cited By ~ 2

Author(s):

Shivani Raghav ◽

Andrea Marongiu ◽

Christian Pinto ◽

Martino Ruggiero ◽

David Atienza ◽

...

Keyword(s):

General Purpose ◽

Graphical Processing Units ◽

Graphical Processing

Download Full-text

LTTng CLUST: A System-Wide Unified CPU and GPU Tracing Tool for OpenCL Applications

Advances in Software Engineering ◽

10.1155/2015/940628 ◽

2015 ◽

Vol 2015 ◽

pp. 1-14 ◽

Cited By ~ 4

Author(s):

David Couturier ◽

Michel R. Dagenais

Keyword(s):

Application Programming Interface ◽

Third Party ◽

Battery Life ◽

Portable Devices ◽

Processing Unit ◽

Central Processing ◽

Application Programming ◽

Graphical Processing ◽

Performance Analysis Tools ◽

Programming Interface

As computation schemes evolve and many new tools become available to programmers to enhance the performance of their applications, many programmers started to look towards highly parallel platforms such as Graphical Processing Unit (GPU). Offloading computations that can take advantage of the architecture of the GPU is a technique that has proven fruitful in recent years. This technology enhances the speed and responsiveness of applications. Also, as a side effect, it reduces the power requirements for those applications and therefore extends portable devices battery life and helps computing clusters to run more power efficiently. Many performance analysis tools such as LTTng, strace and SystemTap already allow Central Processing Unit (CPU) tracing and help programmers to use CPU resources more efficiently. On the GPU side, different tools such as Nvidia’s Nsight, AMD’s CodeXL, and third party TAU and VampirTrace allow tracing Application Programming Interface (API) calls and OpenCL kernel execution. These tools are useful but are completely separate, and none of them allow a unified CPU-GPU tracing experience. We propose an extension to the existing scalable and highly efficient LTTng tracing platform to allow unified tracing of GPU along with CPU’s full tracing capabilities.

Download Full-text