Reduction of computation time using Graphics Processing Unit for the detection of a crack in a large scale concrete structure

Leading high performance computing systems achieve their status through use of highly parallel devices such as NVIDIA graphics processing units or Intel Xeon Phi many-core CPUs. The concept of performance portability across such architectures, as well as traditional CPUs, is vital for the application programmer. In this paper we describe targetDP, a lightweight abstraction layer which allows grid-based applications to target data parallel hardware in a platform agnostic manner. We demonstrate the effectiveness of our pragmatic approach by presenting performance results for a complex fluid application (with which the model was co-designed), plus separate lattice quantum chromodynamics particle physics code. For each application, a single source code base is seen to achieve portable performance, as assessed within the context of the Roofline model. TargetDP can be combined with Message Passing Interface (MPI) to allow use on systems containing multiple nodes: we demonstrate this through provision of scaling results on traditional and graphics processing unit-accelerated large scale supercomputers.

Download Full-text

On the Efficiency of OpenACC-aided GPU-Based FDTD Approach: Application to Lightning Electromagnetic Fields

Applied Sciences ◽

10.3390/app10072359 ◽

2020 ◽

Vol 10 (7) ◽

pp. 2359

Author(s):

Sajad Mohammadi ◽

Hamidreza Karami ◽

Mohammad Azadifar ◽

Farhad Rachidi

Keyword(s):

Electromagnetic Fields ◽

Large Scale ◽

Electromagnetic Compatibility ◽

Programming Model ◽

Graphics Processing Unit ◽

Fdtd Method ◽

Computation Time ◽

Critical Factor ◽

Processing Unit ◽

Computational Performance

An open accelerator (OpenACC)-aided graphics processing unit (GPU)-based finite difference time domain (FDTD) method is presented for the first time for the 3D evaluation of lightning radiated electromagnetic fields along a complex terrain with arbitrary topography. The OpenACC directive-based programming model is used to enhance the computational performance, and the results are compared with those obtained by using a CPU-based model. It is shown that OpenACC GPUs can provide very accurate results, and they are more than 20 times faster than CPUs. The presented results support the use of OpenACC not only in relation to lightning electromagnetics problems, but also to large-scale realistic electromagnetic compatibility (EMC) applications in which computation time efficiency is a critical factor.

Download Full-text

Splotch

The International Journal of High Performance Computing Applications ◽

10.1177/1094342016652713 ◽

2016 ◽

Vol 31 (6) ◽

pp. 550-563

Author(s):

Timothy Dykes ◽

Claudio Gheller ◽

Marzia Rivi ◽

Mel Krokos

Keyword(s):

High Performance ◽

Large Scale ◽

Graphics Processing Unit ◽

Processing Unit ◽

Xeon Phi ◽

The Many ◽

Many Core ◽

Performance Results ◽

Graphics Processing ◽

Performance Computing

With the increasing size and complexity of data produced by large-scale numerical simulations, it is of primary importance for scientists to be able to exploit all available hardware in heterogenous high-performance computing environments for increased throughput and efficiency. We focus on the porting and optimization of Splotch, a scalable visualization algorithm, to utilize the Xeon Phi, Intel’s coprocessor based upon the new many integrated core architecture. We discuss steps taken to offload data to the coprocessor and algorithmic modifications to aid faster processing on the many-core architecture and make use of the uniquely wide vector capabilities of the device, with accompanying performance results using multiple Xeon Phi. Finally we compare performance against results achieved with the Graphics Processing Unit (GPU) based implementation of Splotch.

Download Full-text

Clearance Measurement of 3D Objects Using Accessibility Cone

Volume 2B: 45th Design Automation Conference ◽

10.1115/detc2019-97226 ◽

2019 ◽

Cited By ~ 1

Author(s):

Masatomo Inui ◽

Kouhei Nishimiya ◽

Nobuyuki Umezu

Keyword(s):

Graphics Processing Unit ◽

Three Dimensional ◽

Computation Time ◽

Depth Information ◽

Computation Method ◽

Processing Unit ◽

Three Dimensional Objects ◽

Mechanical Products ◽

Definition Of ◽

Graphics Processing

Abstract Clearance is a basic parameter in the design of mechanical products, generally specified as the distance between two shape elements, for example, the width of a slot. This definition is unsuitable for evaluating the clearance during assembly or manufacturing tasks, where the depth information is also critical. In this paper, we propose a novel definition of clearance for the surface of three-dimensional objects. Unlike the typical methods used to define clearance, the proposed method can simultaneously handle the relationship between the width and depth in the clearance, and thus, obtain an intuitive understanding regarding the assembly and manufacturing capability of a product. Our definition is based on the accessibility cone of a point on the object’s surface; further, the peak angle of the accessibility cone corresponds to the clearance at this point. A computation method of the clearance is presented and the results of its application are demonstrated. Our method uses the rendering function of a graphics processing unit to compute the clearance. A large computation time necessary for the analysis is considered as a problem regarding the practical use of this clearance definition.

Download Full-text

Implementation of large-scale fir adaptive filters on NVIDIA GeForce graphics processing unit

2010 International Symposium on Intelligent Signal Processing and Communication Systems ◽

10.1109/ispacs.2010.5704666 ◽

2010 ◽

Author(s):

Akihiro Hirano ◽

Kenji Nakayama

Keyword(s):

Large Scale ◽

Adaptive Filters ◽

Graphics Processing Unit ◽

Processing Unit ◽

Graphics Processing

Download Full-text

Data Streaming Processing Window Joined With Graphics Processing Units (GPUs)

Encyclopedia of Information Science and Technology, Fifth Edition - Advances in Information Quality and Management ◽

10.4018/978-1-7998-3479-3.ch043 ◽

2021 ◽

pp. 602-623

Author(s):

Shen Lu ◽

Richard S. Segall

Keyword(s):

Big Data ◽

Data Streams ◽

Graphics Processing Units ◽

Data Stream ◽

Large Scale ◽

Graphics Processing Unit ◽

Processing Unit ◽

Data Streaming ◽

Large Scale Data ◽

Graphics Processing

Big data is large-scale data and can be either discrete or continuous. This article entails research that discusses the continuous case of big data often called “data streaming.” More and more businesses will depend on being able to process and make decisions on streams of data. This article utilizes the algorithmic side of data stream processing often called “stream analytics” or “stream mining.” Data streaming Windows Join can be improved by using graphics processing unit (GPU) for higher performance computing. Data streams are generated by two independent threads: one thread can be used to generate Data Stream A, and the other thread can be used to generate Data Stream B. One would use a Windows Join thread to merge the two data streams, which is also the process of “Data Stream Window Join.” The Window Join process can be implemented in parallel that can efficiently improve the computing speed. Experiments are provided for Data Stream Window Joins using both static and dynamic data.

Download Full-text

Installation to Production of a Large-Scale General Purpose Graphics Processing Unit (GPGPU) Cluster at the U.S. Army Research Laboratory: Thufir

10.21236/ada610234 ◽

2014 ◽

Author(s):

Brian J. Henz ◽

John Lazorisak ◽

Jaroslaw Knap ◽

Jason Livingston ◽

Dale R. Shires

Keyword(s):

Research Laboratory ◽

Large Scale ◽

Graphics Processing Unit ◽

General Purpose ◽

Processing Unit ◽

Army Research Laboratory ◽

Graphics Processing ◽

The U.S

Download Full-text

ClusterSheep: A Graphics Processing Unit-Accelerated Software Tool for Large-Scale Clustering of Tandem Mass Spectra from Shotgun Proteomics

Journal of Proteome Research ◽

10.1021/acs.jproteome.1c00485 ◽

2021 ◽

Author(s):

Paul Ka Po To ◽

Long Wu ◽

Chak Ming Chan ◽

Ayman Hoque ◽

Henry Lam

Keyword(s):

Mass Spectra ◽

Large Scale ◽

Graphics Processing Unit ◽

Software Tool ◽

Shotgun Proteomics ◽

Processing Unit ◽

Tandem Mass ◽

Tandem Mass Spectra ◽

Graphics Processing

Download Full-text

Parallel Variable Distribution Algorithm for Constrained Optimization with Nonmonotone Technique

Journal of Applied Mathematics ◽

10.1155/2013/295147 ◽

2013 ◽

Vol 2013 ◽

pp. 1-7 ◽

Cited By ~ 8

Author(s):

Congying Han ◽

Tingting Feng ◽

Guoping He ◽

Tiande Guo

Keyword(s):

Constrained Optimization ◽

Large Scale ◽

Line Search ◽

Optimization Problems ◽

Graphics Processing Unit ◽

Processing Unit ◽

Sqp Method ◽

Nonmonotone Technique ◽

Variable Distribution ◽

Graphics Processing

A modified parallel variable distribution (PVD) algorithm for solving large-scale constrained optimization problems is developed, which modifies quadratic subproblemQPlat each iteration instead of theQPl0of the SQP-type PVD algorithm proposed by C. A. Sagastizábal and M. V. Solodov in 2002. The algorithm can circumvent the difficulties associated with the possible inconsistency ofQPl0subproblem of the original SQP method. Moreover, we introduce a nonmonotone technique instead of the penalty function to carry out the line search procedure with more flexibly. Under appropriate conditions, the global convergence of the method is established. In the final part, parallel numerical experiments are implemented on CUDA based on GPU (Graphics Processing unit).

Download Full-text