HIGH SPEED COMPUTING OF ICE THICKNESS EQUATION FOR ICE SHEET MODEL

Two-dimensional (2-D) ice flow thermodynamics coupled model acts as a vital role for visualizing the ice sheet behaviours of the Antarctica region and the climate system. One of the parameters used in this model is ice thickness. Explicit method of finite difference method (FDM) is used to discretize the ice thickness equation. After that, the equation will be performed on Compute Unified Device Architecture (CUDA) programming by using Graphics Processing Unit (GPU) platform. Nowadays, the demand of GPU for solving the computational problem has been increasing due to the low price and high performance computation properties. This paper investigates the performance of GPU hardware supported by the CUDA parallel programming and capable to compute a large sparse complex system of the ice thickness equation of 2D ice flow thermodynamics model using multiple cores simultaneously and efficiently. The parallel performance evaluation (PPE) is evaluated in terms of execution time, speedup, efficiency, effectiveness and temporal performance.

Download Full-text

High performance reconciliation for continuous-variable quantum key distribution with LDPC code

International Journal of Quantum Information ◽

10.1142/s0219749915500100 ◽

2015 ◽

Vol 13 (02) ◽

pp. 1550010 ◽

Cited By ~ 22

Author(s):

Dakai Lin ◽

Duan Huang ◽

Peng Huang ◽

Jinye Peng ◽

Guihua Zeng

Keyword(s):

Quantum Key Distribution ◽

Secure Communication ◽

High Speed ◽

High Performance ◽

Graphics Processing Unit ◽

Ldpc Code ◽

Key Distribution ◽

Continuous Variable ◽

Processing Unit ◽

Secret Key

Reconciliation is a significant procedure in a continuous-variable quantum key distribution (CV-QKD) system. It is employed to extract secure secret key from the resulted string through quantum channel between two users. However, the efficiency and the speed of previous reconciliation algorithms are low. These problems limit the secure communication distance and the secure key rate of CV-QKD systems. In this paper, we proposed a high-speed reconciliation algorithm through employing a well-structured decoding scheme based on low density parity-check (LDPC) code. The complexity of the proposed algorithm is reduced obviously. By using a graphics processing unit (GPU) device, our method may reach a reconciliation speed of 25 Mb/s for a CV-QKD system, which is currently the highest level and paves the way to high-speed CV-QKD.

Download Full-text

Analysis of the performance of a hybrid CPU/GPU 1D2D coupled model for real flood cases

Journal of Hydroinformatics ◽

10.2166/hydro.2020.032 ◽

2020 ◽

Vol 22 (5) ◽

pp. 1198-1216

Author(s):

Isabel Echeverribar ◽

Mario Morales-Hernández ◽

Pilar Brufau ◽

Pilar García-Navarro

Keyword(s):

High Speed ◽

High Performance ◽

Data Transfer ◽

Coupled Model ◽

Main Channel ◽

Processing Unit ◽

Step Size ◽

Time Step ◽

Time Step Size ◽

Central Processing

Abstract Coupled 1D2D models emerged as an efficient solution for a two-dimensional (2D) representation of the floodplain combined with a fast one-dimensional (1D) schematization of the main channel. At the same time, high-performance computing (HPC) has appeared as an efficient tool for model acceleration. In this work, a previously validated 1D2D Central Processing Unit (CPU) model is combined with an HPC technique for fast and accurate flood simulation. Due to the speed of 1D schemes, a hybrid CPU/GPU model that runs the 1D main channel on CPU and accelerates the 2D floodplain with a Graphics Processing Unit (GPU) is presented. Since the data transfer between sub-domains and devices (CPU/GPU) may be the main potential drawback of this architecture, the test cases are selected to carry out a careful time analysis. The results reveal the speed-up dependency on the 2D mesh, the event to be solved and the 1D discretization of the main channel. Additionally, special attention must be paid to the time step size computation shared between sub-models. In spite of the use of a hybrid CPU/GPU implementation, high speed-ups are accomplished in some cases.

Download Full-text

Graphics processing unit acceleration of the island model genetic algorithm using the CUDA programming platform

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.6286 ◽

2021 ◽

Author(s):

Dylan M. Janssen ◽

Wayne Pullan ◽

Alan Wee‐Chung Liew

Keyword(s):

Genetic Algorithm ◽

Graphics Processing Unit ◽

Island Model ◽

Processing Unit ◽

Cuda Programming ◽

Graphics Processing

Download Full-text

A lightweight approach to performance portability with targetDP

The International Journal of High Performance Computing Applications ◽

10.1177/1094342016682071 ◽

2016 ◽

Vol 32 (2) ◽

pp. 288-301

Author(s):

Alan Gray ◽

Kevin Stratford

Keyword(s):

Particle Physics ◽

Message Passing ◽

Graphics Processing Units ◽

High Performance ◽

Large Scale ◽

Message Passing Interface ◽

Graphics Processing Unit ◽

Processing Unit ◽

Performance Portability ◽

Graphics Processing

Leading high performance computing systems achieve their status through use of highly parallel devices such as NVIDIA graphics processing units or Intel Xeon Phi many-core CPUs. The concept of performance portability across such architectures, as well as traditional CPUs, is vital for the application programmer. In this paper we describe targetDP, a lightweight abstraction layer which allows grid-based applications to target data parallel hardware in a platform agnostic manner. We demonstrate the effectiveness of our pragmatic approach by presenting performance results for a complex fluid application (with which the model was co-designed), plus separate lattice quantum chromodynamics particle physics code. For each application, a single source code base is seen to achieve portable performance, as assessed within the context of the Roofline model. TargetDP can be combined with Message Passing Interface (MPI) to allow use on systems containing multiple nodes: we demonstrate this through provision of scaling results on traditional and graphics processing unit-accelerated large scale supercomputers.

Download Full-text

Embedded GPU Implementation for High-Performance Ultrasound Imaging

Electronics ◽

10.3390/electronics10080884 ◽

2021 ◽

Vol 10 (8) ◽

pp. 884

Author(s):

Stefano Rossi ◽

Enrico Boni

Keyword(s):

High Performance ◽

Graphics Processing Unit ◽

Digital Signal ◽

Processing Unit ◽

Embedded Computing ◽

Field Programmable ◽

Peripheral Component Interconnect ◽

Programmable Gate Arrays ◽

Graphics Processing ◽

Signal Processors

Methods of increasing complexity are currently being proposed for ultrasound (US) echographic signal processing. Graphics Processing Unit (GPU) resources allowing massive exploitation of parallel computing are ideal candidates for these tasks. Many high-performance US instruments, including open scanners like ULA-OP 256, have an architecture based only on Field-Programmable Gate Arrays (FPGAs) and/or Digital Signal Processors (DSPs). This paper proposes the implementation of the embedded NVIDIA Jetson Xavier AGX module on board ULA-OP 256. The system architecture was revised to allow the introduction of a new Peripheral Component Interconnect Express (PCIe) communication channel, while maintaining backward compatibility with all other embedded computing resources already on board. Moreover, the Input/Output (I/O) peripherals of the module make the ultrasound system independent, freeing the user from the need to use an external controlling PC.

Download Full-text

Prediction of Residual Stresses in a Multipass Pipe Weld by a Novel 3D Finite Element Approach

Volume 6B: Materials and Fabrication ◽

10.1115/pvp2018-85044 ◽

2018 ◽

Cited By ~ 1

Author(s):

Hui Huang ◽

Jian Chen ◽

Blair Carlson ◽

Hui-Ping Wang ◽

Paul Crooker ◽

...

Keyword(s):

Finite Element ◽

Residual Stresses ◽

High Performance ◽

Large Scale ◽

Graphics Processing Unit ◽

Computational Cost ◽

Three Dimensional ◽

Processing Unit ◽

Girth Welds ◽

Welding Processes

Due to enormous computation cost, current residual stress simulation of multipass girth welds are mostly performed using two-dimensional (2D) axisymmetric models. The 2D model can only provide limited estimation on the residual stresses by assuming its axisymmetric distribution. In this study, a highly efficient thermal-mechanical finite element code for three dimensional (3D) model has been developed based on high performance Graphics Processing Unit (GPU) computers. Our code is further accelerated by considering the unique physics associated with welding processes that are characterized by steep temperature gradient and a moving arc heat source. It is capable of modeling large-scale welding problems that cannot be easily handled by the existing commercial simulation tools. To demonstrate the accuracy and efficiency, our code was compared with a commercial software by simulating a 3D multi-pass girth weld model with over 1 million elements. Our code achieved comparable solution accuracy with respect to the commercial one but with over 100 times saving on computational cost. Moreover, the three-dimensional analysis demonstrated more realistic stress distribution that is not axisymmetric in hoop direction.

Download Full-text

High-Performance, Graphics Processing Unit-Accelerated Fock Build Algorithm

Journal of Chemical Theory and Computation ◽

10.1021/acs.jctc.0c00768 ◽

2020 ◽

Vol 16 (12) ◽

pp. 7232-7238

Author(s):

Giuseppe M. J. Barca ◽

Jorge L. Galvez-Vallejo ◽

David L. Poole ◽

Alistair P. Rendell ◽

Mark S. Gordon

Keyword(s):

High Performance ◽

Graphics Processing Unit ◽

Processing Unit ◽

Graphics Processing

Download Full-text

Ballooning Graphics Memory Space in Full GPU Virtualization Environments

Scientific Programming ◽

10.1155/2019/5240956 ◽

2019 ◽

Vol 2019 ◽

pp. 1-11

Author(s):

Younghun Park ◽

Minwoo Gu ◽

Sungyong Park

Keyword(s):

High Performance ◽

Virtual Machines ◽

Graphics Processing Unit ◽

Performance Degradation ◽

Processing Unit ◽

Memory Space ◽

Memory Size ◽

Memory Sharing ◽

Gpu Virtualization ◽

Graphics Processing

Advances in virtualization technology have enabled multiple virtual machines (VMs) to share resources in a physical machine (PM). With the widespread use of graphics-intensive applications, such as two-dimensional (2D) or 3D rendering, many graphics processing unit (GPU) virtualization solutions have been proposed to provide high-performance GPU services in a virtualized environment. Although elasticity is one of the major benefits in this environment, the allocation of GPU memory is still static in the sense that after the GPU memory is allocated to a VM, it is not possible to change the memory size at runtime. This causes underutilization of GPU memory or performance degradation of a GPU application due to the lack of GPU memory when an application requires a large amount of GPU memory. In this paper, we propose a GPU memory ballooning solution called gBalloon that dynamically adjusts the GPU memory size at runtime according to the GPU memory requirement of each VM and the GPU memory sharing overhead. The gBalloon extends the GPU memory size of a VM by detecting performance degradation due to the lack of GPU memory. The gBalloon also reduces the GPU memory size when the overcommitted or underutilized GPU memory of a VM creates additional overhead for the GPU context switch or the CPU load due to GPU memory sharing among the VMs. We implemented the gBalloon by modifying the gVirt, a full GPU virtualization solution for Intel’s integrated GPUs. Benchmarking results show that the gBalloon dynamically adjusts the GPU memory size at runtime, which improves the performance by up to 8% against the gVirt with 384 MB of high global graphics memory and 32% against the gVirt with 1024 MB of high global graphics memory.

Download Full-text

GPU empowered pipelines for calculating genome-wide kinship matrices with ultra-high dimensional genetic variants and facilitating 1D and 2D GWAS

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqz009 ◽

2019 ◽

Vol 2 (1) ◽

Author(s):

Wenchao Zhang ◽

Xinbin Dai ◽

Shizhong Xu ◽

Patrick X Zhao

Keyword(s):

Genetic Variants ◽

High Performance ◽

Genome Wide Association Study ◽

Graphics Processing Unit ◽

High Dimensional ◽

Processing Unit ◽

Kinship Matrix ◽

Matrix Operations ◽

Genome Wide ◽

Matrix Calculation

Abstract Genome-wide association study (GWAS) is a powerful approach that has revolutionized the field of quantitative genetics. Two-dimensional GWAS that accounts for epistatic genetic effects needs to consider the effects of marker pairs, thus quadratic genetic variants, compared to one-dimensional GWAS that accounts for individual genetic variants. Calculating genome-wide kinship matrices in GWAS that account for relationships among individuals represented by ultra-high dimensional genetic variants is computationally challenging. Fortunately, kinship matrix calculation involves pure matrix operations and the algorithms can be parallelized, particular on graphics processing unit (GPU)-empowered high-performance computing (HPC) architectures. We have devised a new method and two pipelines: KMC1D and KMC2D for kinship matrix calculation with high-dimensional genetic variants, respectively, facilitating 1D and 2D GWAS analyses. We first divide the ultra-high-dimensional markers and marker pairs into successive blocks. We then calculate the kinship matrix for each block and merge together the block-wise kinship matrices to form the genome-wide kinship matrix. All the matrix operations have been parallelized using GPU kernels on our NVIDIA GPU-accelerated server platform. The performance analyses show that the calculation speed of KMC1D and KMC2D can be accelerated by 100–400 times over the conventional CPU-based computing.

Download Full-text