HIGH SPEED COMPUTING OF ICE THICKNESS EQUATION FOR ICE SHEET MODEL

2016 ◽  
Vol 78 (8-2) ◽  
Author(s):  
Norma Alias ◽  
Masyitah Mohd Saidi

Two-dimensional (2-D) ice flow thermodynamics coupled model acts as a vital role for visualizing the ice sheet behaviours of the Antarctica region and the climate system. One of the parameters used in this model is ice thickness. Explicit method of finite difference method (FDM) is used to discretize the ice thickness equation. After that, the equation will be performed on Compute Unified Device Architecture (CUDA) programming by using Graphics Processing Unit (GPU) platform. Nowadays, the demand of GPU for solving the computational problem has been increasing due to the low price and high performance computation properties. This paper investigates the performance of GPU hardware supported by the CUDA parallel programming and capable to compute a large sparse complex system of the ice thickness equation of 2D ice flow thermodynamics model using multiple cores simultaneously and efficiently. The parallel performance evaluation (PPE) is evaluated in terms of execution time, speedup, efficiency, effectiveness and temporal performance.

2015 ◽  
Vol 13 (02) ◽  
pp. 1550010 ◽  
Author(s):  
Dakai Lin ◽  
Duan Huang ◽  
Peng Huang ◽  
Jinye Peng ◽  
Guihua Zeng

Reconciliation is a significant procedure in a continuous-variable quantum key distribution (CV-QKD) system. It is employed to extract secure secret key from the resulted string through quantum channel between two users. However, the efficiency and the speed of previous reconciliation algorithms are low. These problems limit the secure communication distance and the secure key rate of CV-QKD systems. In this paper, we proposed a high-speed reconciliation algorithm through employing a well-structured decoding scheme based on low density parity-check (LDPC) code. The complexity of the proposed algorithm is reduced obviously. By using a graphics processing unit (GPU) device, our method may reach a reconciliation speed of 25 Mb/s for a CV-QKD system, which is currently the highest level and paves the way to high-speed CV-QKD.


2020 ◽  
Vol 22 (5) ◽  
pp. 1198-1216
Author(s):  
Isabel Echeverribar ◽  
Mario Morales-Hernández ◽  
Pilar Brufau ◽  
Pilar García-Navarro

Abstract Coupled 1D2D models emerged as an efficient solution for a two-dimensional (2D) representation of the floodplain combined with a fast one-dimensional (1D) schematization of the main channel. At the same time, high-performance computing (HPC) has appeared as an efficient tool for model acceleration. In this work, a previously validated 1D2D Central Processing Unit (CPU) model is combined with an HPC technique for fast and accurate flood simulation. Due to the speed of 1D schemes, a hybrid CPU/GPU model that runs the 1D main channel on CPU and accelerates the 2D floodplain with a Graphics Processing Unit (GPU) is presented. Since the data transfer between sub-domains and devices (CPU/GPU) may be the main potential drawback of this architecture, the test cases are selected to carry out a careful time analysis. The results reveal the speed-up dependency on the 2D mesh, the event to be solved and the 1D discretization of the main channel. Additionally, special attention must be paid to the time step size computation shared between sub-models. In spite of the use of a hybrid CPU/GPU implementation, high speed-ups are accomplished in some cases.


Author(s):  
Alan Gray ◽  
Kevin Stratford

Leading high performance computing systems achieve their status through use of highly parallel devices such as NVIDIA graphics processing units or Intel Xeon Phi many-core CPUs. The concept of performance portability across such architectures, as well as traditional CPUs, is vital for the application programmer. In this paper we describe targetDP, a lightweight abstraction layer which allows grid-based applications to target data parallel hardware in a platform agnostic manner. We demonstrate the effectiveness of our pragmatic approach by presenting performance results for a complex fluid application (with which the model was co-designed), plus separate lattice quantum chromodynamics particle physics code. For each application, a single source code base is seen to achieve portable performance, as assessed within the context of the Roofline model. TargetDP can be combined with Message Passing Interface (MPI) to allow use on systems containing multiple nodes: we demonstrate this through provision of scaling results on traditional and graphics processing unit-accelerated large scale supercomputers.


Electronics ◽  
2021 ◽  
Vol 10 (8) ◽  
pp. 884
Author(s):  
Stefano Rossi ◽  
Enrico Boni

Methods of increasing complexity are currently being proposed for ultrasound (US) echographic signal processing. Graphics Processing Unit (GPU) resources allowing massive exploitation of parallel computing are ideal candidates for these tasks. Many high-performance US instruments, including open scanners like ULA-OP 256, have an architecture based only on Field-Programmable Gate Arrays (FPGAs) and/or Digital Signal Processors (DSPs). This paper proposes the implementation of the embedded NVIDIA Jetson Xavier AGX module on board ULA-OP 256. The system architecture was revised to allow the introduction of a new Peripheral Component Interconnect Express (PCIe) communication channel, while maintaining backward compatibility with all other embedded computing resources already on board. Moreover, the Input/Output (I/O) peripherals of the module make the ultrasound system independent, freeing the user from the need to use an external controlling PC.


Author(s):  
Hui Huang ◽  
Jian Chen ◽  
Blair Carlson ◽  
Hui-Ping Wang ◽  
Paul Crooker ◽  
...  

Due to enormous computation cost, current residual stress simulation of multipass girth welds are mostly performed using two-dimensional (2D) axisymmetric models. The 2D model can only provide limited estimation on the residual stresses by assuming its axisymmetric distribution. In this study, a highly efficient thermal-mechanical finite element code for three dimensional (3D) model has been developed based on high performance Graphics Processing Unit (GPU) computers. Our code is further accelerated by considering the unique physics associated with welding processes that are characterized by steep temperature gradient and a moving arc heat source. It is capable of modeling large-scale welding problems that cannot be easily handled by the existing commercial simulation tools. To demonstrate the accuracy and efficiency, our code was compared with a commercial software by simulating a 3D multi-pass girth weld model with over 1 million elements. Our code achieved comparable solution accuracy with respect to the commercial one but with over 100 times saving on computational cost. Moreover, the three-dimensional analysis demonstrated more realistic stress distribution that is not axisymmetric in hoop direction.


2020 ◽  
Vol 16 (12) ◽  
pp. 7232-7238
Author(s):  
Giuseppe M. J. Barca ◽  
Jorge L. Galvez-Vallejo ◽  
David L. Poole ◽  
Alistair P. Rendell ◽  
Mark S. Gordon

2019 ◽  
Vol 2019 ◽  
pp. 1-11
Author(s):  
Younghun Park ◽  
Minwoo Gu ◽  
Sungyong Park

Advances in virtualization technology have enabled multiple virtual machines (VMs) to share resources in a physical machine (PM). With the widespread use of graphics-intensive applications, such as two-dimensional (2D) or 3D rendering, many graphics processing unit (GPU) virtualization solutions have been proposed to provide high-performance GPU services in a virtualized environment. Although elasticity is one of the major benefits in this environment, the allocation of GPU memory is still static in the sense that after the GPU memory is allocated to a VM, it is not possible to change the memory size at runtime. This causes underutilization of GPU memory or performance degradation of a GPU application due to the lack of GPU memory when an application requires a large amount of GPU memory. In this paper, we propose a GPU memory ballooning solution called gBalloon that dynamically adjusts the GPU memory size at runtime according to the GPU memory requirement of each VM and the GPU memory sharing overhead. The gBalloon extends the GPU memory size of a VM by detecting performance degradation due to the lack of GPU memory. The gBalloon also reduces the GPU memory size when the overcommitted or underutilized GPU memory of a VM creates additional overhead for the GPU context switch or the CPU load due to GPU memory sharing among the VMs. We implemented the gBalloon by modifying the gVirt, a full GPU virtualization solution for Intel’s integrated GPUs. Benchmarking results show that the gBalloon dynamically adjusts the GPU memory size at runtime, which improves the performance by up to 8% against the gVirt with 384 MB of high global graphics memory and 32% against the gVirt with 1024 MB of high global graphics memory.


2019 ◽  
Vol 2 (1) ◽  
Author(s):  
Wenchao Zhang ◽  
Xinbin Dai ◽  
Shizhong Xu ◽  
Patrick X Zhao

Abstract Genome-wide association study (GWAS) is a powerful approach that has revolutionized the field of quantitative genetics. Two-dimensional GWAS that accounts for epistatic genetic effects needs to consider the effects of marker pairs, thus quadratic genetic variants, compared to one-dimensional GWAS that accounts for individual genetic variants. Calculating genome-wide kinship matrices in GWAS that account for relationships among individuals represented by ultra-high dimensional genetic variants is computationally challenging. Fortunately, kinship matrix calculation involves pure matrix operations and the algorithms can be parallelized, particular on graphics processing unit (GPU)-empowered high-performance computing (HPC) architectures. We have devised a new method and two pipelines: KMC1D and KMC2D for kinship matrix calculation with high-dimensional genetic variants, respectively, facilitating 1D and 2D GWAS analyses. We first divide the ultra-high-dimensional markers and marker pairs into successive blocks. We then calculate the kinship matrix for each block and merge together the block-wise kinship matrices to form the genome-wide kinship matrix. All the matrix operations have been parallelized using GPU kernels on our NVIDIA GPU-accelerated server platform. The performance analyses show that the calculation speed of KMC1D and KMC2D can be accelerated by 100–400 times over the conventional CPU-based computing.


Sign in / Sign up

Export Citation Format

Share Document