gpu implementation Latest Research Papers

GPU Implementation of Staggered Period Estimation Algorithm Acceleration

Lecture Notes in Electrical Engineering - Genetic and Evolutionary Computing ◽

10.1007/978-981-16-8430-2_26 ◽

2022 ◽

pp. 284-293

Author(s):

Zhuojun Xu ◽

Hangwei Hu ◽

Long Wang ◽

Chengzhi Yang ◽

Yantao Tian

Keyword(s):

Estimation Algorithm ◽

Period Estimation ◽

Algorithm Acceleration ◽

Gpu Implementation

Graphics processing unit implementation of the F-statistic for continuous gravitational wave searches

Classical and Quantum Gravity ◽

10.1088/1361-6382/ac4616 ◽

2021 ◽

Author(s):

Liam Dunn ◽

Patrick Clearwater ◽

Andrew Melatos ◽

Karl Wette

Keyword(s):

Gravitational Wave ◽

Graphics Processing Units ◽

Graphics Processing Unit ◽

Computational Cost ◽

Processing Unit ◽

Central Processing ◽

Long Baseline ◽

Using Data ◽

Graphics Processing ◽

Gpu Implementation

Abstract The F-statistic is a detection statistic used widely in searches for continuous gravitational waves with terrestrial, long-baseline interferometers. A new implementation of the F-statistic is presented which accelerates the existing "resampling" algorithm using graphics processing units (GPUs). The new implementation runs between 10 and 100 times faster than the existing implementation on central processing units without sacrificing numerical accuracy. The utility of the GPU implementation is demonstrated on a pilot narrowband search for four newly discovered millisecond pulsars in the globular cluster Omega Centauri using data from the second Laser Interferometer Gravitational-Wave Observatory observing run. The computational cost is 17:2 GPU-hours using the new implementation, compared to 1092 core-hours with the existing implementation.

Real-Time LiDAR Point Cloud Semantic Segmentation for Autonomous Driving

Electronics ◽

10.3390/electronics11010011 ◽

2021 ◽

Vol 11 (1) ◽

pp. 11

Author(s):

Xing Xie ◽

Lin Bai ◽

Xinming Huang

Keyword(s):

Real Time ◽

Power Efficiency ◽

Point Cloud ◽

Processing Time ◽

State Of The Art ◽

Semantic Segmentation ◽

Autonomous Driving ◽

Geometric Information ◽

Embedded Platform ◽

Gpu Implementation

LiDAR has been widely used in autonomous driving systems to provide high-precision 3D geometric information about the vehicle’s surroundings for perception, localization, and path planning. LiDAR-based point cloud semantic segmentation is an important task with a critical real-time requirement. However, most of the existing convolutional neural network (CNN) models for 3D point cloud semantic segmentation are very complex and can hardly be processed at real-time on an embedded platform. In this study, a lightweight CNN structure was proposed for projection-based LiDAR point cloud semantic segmentation with only 1.9 M parameters that gave an 87% reduction comparing to the state-of-the-art networks. When evaluated on a GPU, the processing time was 38.5 ms per frame, and it achieved a 47.9% mIoU score on Semantic-KITTI dataset. In addition, the proposed CNN is targeted on an FPGA using an NVDLA architecture, which results in a 2.74x speedup over the GPU implementation with a 46 times improvement in terms of power efficiency.

GPU Implementation of Faber Schauder Discrete Wavelet Transform using CUDA

International Journal of Computer Applications ◽

10.5120/ijca2021921815 ◽

2021 ◽

Vol 183 (42) ◽

pp. 1-8

Author(s):

Assma Azeroual ◽

Karim Afdel

Keyword(s):

Wavelet Transform ◽

Discrete Wavelet Transform ◽

Discrete Wavelet ◽

Gpu Implementation

An Efficient GPU Implementation and Scaling for Higher-Order 3D Stencils

Information Sciences ◽

10.1016/j.ins.2021.11.042 ◽

2021 ◽

Author(s):

Omer Anjum ◽

Mohammad Almasri ◽

Simon Garcia de Gonzalo ◽

Wen-mei Hwu

Keyword(s):

Higher Order ◽

Gpu Implementation

A GPU Implementation of Watercolor Painting Image Generation

10.1109/candarw53999.2021.00031 ◽

2021 ◽

Author(s):

Jiamian Huang ◽

Yasuaki Ito ◽

Koji Nakano

Keyword(s):

Image Generation ◽

Gpu Implementation

A Massively Parallel Restriction-Smoothed Basis Multiscale Solver on Multi-Core and GPU Architectures

10.2118/203939-ms ◽

2021 ◽

Author(s):

Abdulrahman Manea

Keyword(s):

Shared Memory ◽

Parallel Implementation ◽

Real Life ◽

Parallel Architecture ◽

Industrial Applications ◽

Multiscale Methods ◽

Basis Functions ◽

Massively Parallel ◽

Gpu Architectures ◽

Gpu Implementation

Abstract Due to its simplicity, adaptability, and applicability to various grid formats, the restriction-smoothed basis multiscale method (MsRSB) (Møyne and Lie 2016) has received wide attention and has been extended to various flow problems in porous media. Unlike the standard multiscale methods, MsRSB relies on iterative smoothing to find the multiscale basis functions in an adaptive manner, giving it the ability to naturally adjust to various complex grid orientations often encountered in real-life industrial applications. In this work, we investigate the scalability of MsRSB on various state-of-the-art parallel architectures, including multi-core systems and GPUs. While MsRSB is — like most other multiscale methods — directly amenable to parallelization, the dependence on a smoother to find the basis functions creates unique control- and data-flow patterns. These patterns require careful design and implementation in parallel environments to achieve good scalability. We extend the work on parallel multiscale methods in Manea et al. (2016) and Manea and Almani (2019) to map the MsRSB special kernels to the shared-memory parallel multi-core and GPU architectures. The scalability of our optimized parallel MsRSB implementation is demonstrated using highly heterogeneous 3D problems derived from the SPE10 Benchmark (Christie and Blunt 2001). Those problems range in size from millions to tens of millions of cells. The multi-core implementation is benchmarked on a shared memory multi-core architecture consisting of two packages of Intel's Cascade Lake Xeon® Gold 6246 CPU, while the GPU implementation is benchmarked on a massively parallel architecture consisting of Nvidia Volta V100 GPUs. We compare the multi-core implementation to the GPU implementation for both the setup and solution stages. To the best of our knowledge, this is the first parallel implementation and demonstration of the versatile MsRSB method on the GPU architecture.

Multi-GPU implementation of a time-explicit finite volume solver using CUDA and a CUDA-Aware version of OpenMPI with application to shallow water flows

Computer Physics Communications ◽

10.1016/j.cpc.2021.108190 ◽

2021 ◽

pp. 108190

Author(s):

Vincent Delmas ◽

Azzedine Soulaïmani

Keyword(s):

Shallow Water ◽

Finite Volume ◽

Water Flows ◽

Shallow Water Flows ◽

Gpu Implementation

Fast GPU Implementation of Dumer’s Algorithm Solving the Syndrome Decoding Problem

10.1109/ispa-bdcloud-socialcom-sustaincom52081.2021.00136 ◽

2021 ◽

Author(s):

Shintaro Narisada ◽

Kazuhide Fukushima ◽

Shinsaku Kiyomoto

Keyword(s):

Syndrome Decoding ◽

Gpu Implementation

A Single-Card GPU Implementation of Peridynamics

10.1115/detc2021-68032 ◽

2021 ◽

Author(s):

John D. Bartlett ◽

Duane Storti

Keyword(s):

Data Structure ◽

Message Passing ◽

Message Passing Interface ◽

Gpu Computing ◽

Rapid Development ◽

Material Point ◽

Simulation Methods ◽

Benchmark Test ◽

Gpu Implementation ◽

Promising Avenue

Abstract The rapid development of parallelization technology over the recent decades has provided a promising avenue for the acceleration of meshfree simulation methods. One such method, peridynamics, is particularly well-suited for parallelization due to the simplicity of the operations which must occur at each material point. However, while MPI-based parallelization (Message-Passing Interface; a method for CPU-based parallelization) of peridynamic problems is commonplace, GPU parallelization of peridynamics has received far less attention. While GPU technology may have once been an inferior option to MPI parallelization for peridynamics, modern GPU cards are more than capable of handling substantially sized peridynamics problems. This paper presents the parallelization of the peridynamic method for single-card GPU computing, providing a schematic for a compact parallel approach. The resulting method is tested with CUDA on a NVIDIA Tesla P100 card with 16 GB of memory. The per-node memory requirements for each data structure used are evaluated, as well as the per-node execution times for each operation in a million-node benchmark test. This setup is shown to provide speedup factors over 200 for problems sized up to several million nodes, therefore indicating such a GPU is more than adequate for the single-card parallelization of the peridynamic method.

gpu implementation
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

GPU Implementation of Staggered Period Estimation Algorithm Acceleration

Graphics processing unit implementation of the F-statistic for continuous gravitational wave searches

Real-Time LiDAR Point Cloud Semantic Segmentation for Autonomous Driving

GPU Implementation of Faber Schauder Discrete Wavelet Transform using CUDA

An Efficient GPU Implementation and Scaling for Higher-Order 3D Stencils

A GPU Implementation of Watercolor Painting Image Generation

A Massively Parallel Restriction-Smoothed Basis Multiscale Solver on Multi-Core and GPU Architectures

Multi-GPU implementation of a time-explicit finite volume solver using CUDA and a CUDA-Aware version of OpenMPI with application to shallow water flows

Fast GPU Implementation of Dumer’s Algorithm Solving the Syndrome Decoding Problem

A Single-Card GPU Implementation of Peridynamics

Export Citation Format

gpu implementationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

GPU Implementation of Staggered Period Estimation Algorithm Acceleration

Graphics processing unit implementation of the F-statistic for continuous gravitational wave searches

Real-Time LiDAR Point Cloud Semantic Segmentation for Autonomous Driving

GPU Implementation of Faber Schauder Discrete Wavelet Transform using CUDA

An Efficient GPU Implementation and Scaling for Higher-Order 3D Stencils

A GPU Implementation of Watercolor Painting Image Generation

A Massively Parallel Restriction-Smoothed Basis Multiscale Solver on Multi-Core and GPU Architectures

Multi-GPU implementation of a time-explicit finite volume solver using CUDA and a CUDA-Aware version of OpenMPI with application to shallow water flows

Fast GPU Implementation of Dumer’s Algorithm Solving the Syndrome Decoding Problem

A Single-Card GPU Implementation of Peridynamics

gpu implementation
Recently Published Documents