graphic processing units Latest Research Papers

2021 ◽

Vol 4 ◽

pp. 10-15

Author(s):

Gennadii Malaschonok ◽

Serhii Sukharskyi

Keyword(s):

Graphics Processing Unit ◽

Matrix Decomposition ◽

Recent Decade ◽

Orthogonal Matrix ◽

Processing Unit ◽

Central Processor ◽

Graphic Processing Units ◽

Qr Algorithm ◽

Java Language ◽

Graphics Processing

With the development of the Big Data sphere, as well as those fields of study that we can relate to artificial intelligence, the need for fast and efficient computing has become one of the most important tasks nowadays. That is why in the recent decade, graphics processing unit computations have been actively developing to provide an ability for scientists and developers to use thousands of cores GPUs have in order to perform intensive computations. The goal of this research is to implement orthogonal decomposition of a matrix by applying a series of Householder transformations in Java language using JCuda library to conduct a research on its benefits. Several related papers were examined. Malaschonok and Savchenko in their work have introduced an improved version of QR algorithm for this purpose [4] and achieved better results, however Householder algorithm is more promising for GPUs according to another team of researchers – Lahabar and Narayanan [6]. However, they were using Float numbers, while we are using Double, and apart from that we are working on a new BigDecimal type for CUDA. Apart from that, there is still no solution for handling huge matrices where errors in calculations might occur. The algorithm of orthogonal matrix decomposition, which is the first part of SVD algorithm, is researched and implemented in this work. The implementation of matrix bidiagonalization and calculation of orthogonal factors by the Hausholder method in the jCUDA environment on a graphics processor is presented, and the algorithm for the central processor for comparisons is also implemented. Research of the received results where we experimentally measured acceleration of calculations with the use of the graphic processor in comparison with the implementation on the central processor are carried out. We show a speedup up to 53 times compared to CPU implementation on a big matrix size, specifically 2048, and even better results when using more advanced GPUs. At the same time, we still experience bigger errors in calculations while using graphic processing units due to synchronization problems. We compared execution on different platforms (Windows 10 and Arch Linux) and discovered that they are almost the same, taking the computation speed into account. The results have shown that on GPU we can achieve better performance, however there are more implementation difficulties with this approach.

Download Full-text

Optimized Weight Programming for Analogue Memory-based Deep Neural Networks

10.21203/rs.3.rs-1028668/v1 ◽

2021 ◽

Author(s):

Charles Mackin ◽

Malte Rasch ◽

An Chen ◽

Jonathan Timcheck ◽

Robert Bruce ◽

...

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Short Term Memory ◽

Memory Device ◽

Computational Technique ◽

Full Potential ◽

Graphic Processing Units ◽

Short Term ◽

Computational Framework ◽

Long Short Term Memory

Abstract Analogue memory-based Deep Neural Networks (DNNs) provide energy-efficiency and per-area throughput gains relative to state-of-the-art digital counterparts such as graphic processing units (GPUs). Recent advances focus largely on hardware-aware algorithmic training and improvements in circuits, architectures, and memory device characteristics. Optimal translation of software-trained weights into analogue hardware weights---given the plethora of complex memory non-idealities---represents an equally important goal in realizing the full potential of this technology. We report a generalized computational framework that automates the process of crafting complex weight programming strategies for analogue memory-based DNNs, in order to minimize accuracy degradations during inference, particularly over time. This framework is agnostic to DNN structure and is shown to generalize well across Long Short-Term Memory (LSTM), Convolution Neural Networks (CNNs), and Transformer networks. Being a highly-flexible numerical heuristic, our approach can accommodate arbitrary device-level complexity, and is thus broadly applicable to a variety of analogue memories and their continually evolving device characteristics. Interestingly, this computational technique is capable of optimizing inference accuracy without the need to run inference simulations or evaluate large training, validation, or test datasets. Lastly, by quantifying the limit of achievable inference accuracy given imperfections in analogue memory, weight programming optimization represents a unique and foundational tool for enabling analogue memory-based DNN accelerators to reach their full inference potential.

Download Full-text

Discrete Environment-Driven GPU-Based Ray Launching: Validation and Applications

Electronics ◽

10.3390/electronics10212630 ◽

2021 ◽

Vol 10 (21) ◽

pp. 2630

Author(s):

Enrico M. Vitucci ◽

Jonathan S. Lu ◽

Scot Gordon ◽

Jian Jet Zhu ◽

Vittorio Degli-Esposti

Keyword(s):

Ray Tracing ◽

San Francisco ◽

Computational Efficiency ◽

Urban Areas ◽

Computation Time ◽

Large Set ◽

Pixel Size ◽

Graphic Processing Units ◽

Tracing Algorithm ◽

Ray Launching

In this work, the Discrete, Environment-Driven Ray Launching (DED-RL) algorithm, which makes use of parallelization on Graphic Processing Units, fully described in a previous paper, has been validated versus a large set of measurements to evaluate its performance in terms of both computational efficiency and accuracy. Three major urban areas have been considered, including a very challenging scenario in central San Francisco that was used as a benchmark to test an image-ray tracing algorithm in a previous work. Results show that DED-RL is as accurate as ray tracing, despite the much lower computation time, reduced by more than three orders of magnitude with respect to ray tracing. Moreover, the accuracy level only marginally depends on discretization pixel size, at least for the considered pixel size range. The unprecedented computational efficiency of DED-RL opens the way to numerous applications, ranging from RF coverage optimization of drone-aided cellular networks to efficient fingerprinting localization applications, as briefly discussed in the paper.

Download Full-text

An Efficient GPU Implementation of a Coupled Overland-Sewer Hydraulic Model with Pollutant Transport

Hydrology ◽

10.3390/hydrology8040146 ◽

2021 ◽

Vol 8 (4) ◽

pp. 146

Author(s):

Javier Fernández-Pato ◽

Pilar García-Navarro

Keyword(s):

Water Flow ◽

Numerical Models ◽

Global Analysis ◽

Coupled Model ◽

Pollutant Transport ◽

Pollutant Dispersion ◽

Quality Analysis ◽

Storm Events ◽

Graphic Processing Units ◽

Intense Storm

Numerical simulation of flows that consider interaction between overland and drainage networks has become a practical tool to prevent and mitigate flood situations in urban environments, especially when dealing with intense storm events, where the limited capacity of the sewer systems can be a trigger for flooding. Additionally, in order to prevent any kind of pollutant dispersion through the drainage network, it is very interesting to have a certain monitorization or control over the quality of the water that flows in both domains. In this sense, the addition of a pollutant transport component to both surface and sewer hydraulic models would benefit the global analysis of the combined water flow. On the other hand, when considering a realistic large domain with complex topography or streets structure, a fine spatial discretization is mandatory. Hence the number of grid cells is usually very large and, therefore, it is necessary to use parallelization techniques for the calculation, the use of Graphic Processing Units (GPU) being one of the most efficient due to the leveraging of thousands of processors within a single device. In this work, an efficient GPU-based 2D shallow water flow solver (RiverFlow2D-GPU) is fully coupled with EPA’s Storm Water Management Model (SWMM). Both models are able to develop a transient water quality analysis taking into account several pollutants. The coupled model, referred to as RiverFlow2D-GPU UD (Urban Drainge) is applied to three real-world cases, covering the most common hydraulic situations in urban hydrology/hydraulics. A UK Environmental Agency test case is used as model validation, showing a good agreement between RiverFlow2D-GPU UD and the rest of the numerical models considered. The efficiency of the model is proven in two more complex domains, leading to a >100x faster simulations compared with the traditional CPU computation.

Download Full-text

Indoor objects detection system implementation using multi-graphic processing units

Cluster Computing ◽

10.1007/s10586-021-03419-9 ◽

2021 ◽

Author(s):

Mouna Afif ◽

Riadh Ayachi ◽

Mohamed Atri

Keyword(s):

Detection System ◽

System Implementation ◽

Graphic Processing Units ◽

Objects Detection ◽

Graphic Processing

Download Full-text

VEDAS: an efficient GPU alternative for store and query of large RDF data sets

Journal Of Big Data ◽

10.1186/s40537-021-00513-y ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Pisit Makpaisit ◽

Chantana Chantrapornchai

Keyword(s):

Query Processing ◽

Processing Time ◽

Data Transfer ◽

Data Representation ◽

Data Sets ◽

Graphic Processing Units ◽

Memory Transfer ◽

Vast Number ◽

Rdf Data ◽

Description Framework

AbstractResource Description Framework (RDF) is commonly used as a standard for data interchange on the web. The collection of RDF data sets can form a large graph which consumes time to query. It is known that modern Graphic Processing Units (GPUs) can be employed to execute parallel programs in order to speedup the running time. In this paper, we propose a novel RDF data representation along with the query processing algorithm that is suitable for GPU processing. Since the main challenges of GPU architecture are the limited memory sizes, the memory transfer latency, and the vast number of GPU cores. Our system is designed to strengthen the use of GPU cores and reduce the effect of memory transfer. We propose a representation consists of indices and column-based RDF ID data that can reduce the GPU memory requirement. The indexing and pre-upload filtering techniques are then applied to reduce the data transfer between the host and GPU memory. We add the index swapping process to facilitate the sorting and joining data process based on the given variable and add the pre-upload step to reduce the size of results’ storage, and the data transfer time. The experimental results show that our representation is about 35% smaller than the traditional NT format and 40% less compared to that of gStore. The query processing time can be speedup ranging from 1.95 to 397.03 when compared with RDF3X and gStore processing time with WatDiv test suite. It achieves speedup 578.57 and 62.97 for LUBM benchmark when compared to RDF-3X and gStore. The analysis shows the query cases which can gain benefits from our approach.

Download Full-text

A High-Order Finite-Difference Method on Staggered Curvilinear Grids for Seismic Wave Propagation Applications with Topography

Bulletin of the Seismological Society of America ◽

10.1785/0120210096 ◽

2021 ◽

Author(s):

Ossian O’Reilly ◽

Te-Yang Yeh ◽

Kim B. Olsen ◽

Zhifeng Hu ◽

Alex Breuer ◽

...

Keyword(s):

Wave Propagation ◽

Finite Difference ◽

Finite Difference Method ◽

Difference Method ◽

Vertical Axis ◽

Cartesian Grid ◽

Staggered Grid ◽

Graphic Processing Units ◽

Curvilinear Grids ◽

Curvilinear Grid

ABSTRACT We developed a 3D elastic wave propagation solver that supports topography using staggered curvilinear grids. Our method achieves comparable accuracy to the classical fourth-order staggered grid velocity–stress finite-difference method on a Cartesian grid. We show that the method is provably stable using summation-by-parts operators and weakly imposed boundary conditions via penalty terms. The maximum stable timestep obeys a relationship that depends on the topography-induced grid stretching along the vertical axis. The solutions from the approach are in excellent agreement with verified results for a Gaussian-shaped hill and for a complex topographic model. Compared with a Cartesian grid, the curvilinear grid adds negligible memory requirements, but requires longer simulation times due to smaller timesteps for complex topography. The code shows 94% weak scaling efficiency up to 1014 graphic processing units.

Download Full-text

Energy correction and analytic energy gradients due to triples in CCSD(T) with spin–orbit coupling on graphic processing units using single-precision data

Molecular Physics ◽

10.1080/00268976.2021.1974591 ◽

2021 ◽

Author(s):

Minggang Guo ◽

Zhifan Wang ◽

Yanzhao Lu ◽

Fan Wang

Keyword(s):

Orbit Coupling ◽

Spin Orbit Coupling ◽

Spin Orbit ◽

Graphic Processing Units ◽

Single Precision ◽

Precision Data ◽

Energy Correction ◽

Graphic Processing

Download Full-text

Continuous Gravitational-Wave Data Analysis with General Purpose Computing on Graphic Processing Units

Universe ◽

10.3390/universe7070218 ◽

2021 ◽

Vol 7 (7) ◽

pp. 218

Author(s):

Iuri La Rosa ◽

Pia Astone ◽

Sabrina D’Antonio ◽

Sergio Frasca ◽

Paola Leaci ◽

...

Keyword(s):

Data Analysis ◽

General Purpose ◽

Gpu Programming ◽

Computational Power ◽

Graphic Processing Units ◽

New Approach ◽

Multicore System ◽

Speed Up ◽

High Level ◽

Graphic Processing

We present a new approach to searching for Continuous gravitational Waves (CWs) emitted by isolated rotating neutron stars, using the high parallel computing efficiency and computational power of modern Graphic Processing Units (GPUs). Specifically, in this paper the porting of one of the algorithms used to search for CW signals, the so-called FrequencyHough transform, on the TensorFlow framework, is described. The new code has been fully tested and its performance on GPUs has been compared to those in a CPU multicore system of the same class, showing a factor of 10 speed-up. This demonstrates that GPU programming with general purpose libraries (the those of the TensorFlow framework) of a high-level programming language can provide a significant improvement of the performance of data analysis, opening new perspectives on wide-parameter searches for CWs.

Download Full-text

Fully Quantum Modeling of Exciton Diffusion in Mesoscale Light Harvesting Systems

Materials ◽

10.3390/ma14123291 ◽

2021 ◽

Vol 14 (12) ◽

pp. 3291

Author(s):

Fulu Zheng ◽

Lipeng Chen ◽

Jianbo Gao ◽

Yang Zhao

Keyword(s):

Light Harvesting ◽

Purple Bacteria ◽

Time Dependent ◽

Graphic Processing Units ◽

Exciton Diffusion ◽

Quantum Mechanical Treatment ◽

Exciton Transfer ◽

Harvesting Systems ◽

Intractable Problem ◽

Computational Resources

It has long been a challenge to accurately and efficiently simulate exciton–phonon dynamics in mesoscale photosynthetic systems with a fully quantum mechanical treatment due to extensive computational resources required. In this work, we tackle this seemingly intractable problem by combining the Dirac–Frenkel time-dependent variational method with Davydov trial states and implementing the algorithm in graphic processing units. The phonons are treated on the same footing as the exciton. Tested with toy models, which are nanoarrays of the B850 pigments from the light harvesting 2 complexes of purple bacteria, the methodology is adopted to describe exciton diffusion in huge systems containing more than 1600 molecules. The superradiance enhancement factor extracted from the simulations indicates an exciton delocalization over two to three pigments, in agreement with measurements of fluorescence quantum yield and lifetime in B850 systems. With fractal analysis of the exciton dynamics, it is found that exciton transfer in B850 nanoarrays exhibits a superdiffusion component for about 500 fs. Treating the B850 ring as an aggregate and modeling the inter-ring exciton transfer as incoherent hopping, we also apply the method of classical master equations to estimate exciton diffusion properties in one-dimensional (1D) and two-dimensional (2D) B850 nanoarrays using derived analytical expressions of time-dependent excitation probabilities. For both coherent and incoherent propagation, faster energy transfer is uncovered in 2D nanoarrays than 1D chains, owing to availability of more numerous propagating channels in the 2D arrangement.

Download Full-text

graphic processing units
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

А Gpu-based Orthogonal Matrix Factorization Algorithm that Produces a Two-Diagonal Shape

Optimized Weight Programming for Analogue Memory-based Deep Neural Networks

Discrete Environment-Driven GPU-Based Ray Launching: Validation and Applications

An Efficient GPU Implementation of a Coupled Overland-Sewer Hydraulic Model with Pollutant Transport

Indoor objects detection system implementation using multi-graphic processing units

VEDAS: an efficient GPU alternative for store and query of large RDF data sets

A High-Order Finite-Difference Method on Staggered Curvilinear Grids for Seismic Wave Propagation Applications with Topography

Energy correction and analytic energy gradients due to triples in CCSD(T) with spin–orbit coupling on graphic processing units using single-precision data

Continuous Gravitational-Wave Data Analysis with General Purpose Computing on Graphic Processing Units

Fully Quantum Modeling of Exciton Diffusion in Mesoscale Light Harvesting Systems

Export Citation Format

graphic processing unitsRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

А Gpu-based Orthogonal Matrix Factorization Algorithm that Produces a Two-Diagonal Shape

Optimized Weight Programming for Analogue Memory-based Deep Neural Networks

Discrete Environment-Driven GPU-Based Ray Launching: Validation and Applications

An Efficient GPU Implementation of a Coupled Overland-Sewer Hydraulic Model with Pollutant Transport

Indoor objects detection system implementation using multi-graphic processing units

VEDAS: an efficient GPU alternative for store and query of large RDF data sets

A High-Order Finite-Difference Method on Staggered Curvilinear Grids for Seismic Wave Propagation Applications with Topography

Energy correction and analytic energy gradients due to triples in CCSD(T) with spin–orbit coupling on graphic processing units using single-precision data

Continuous Gravitational-Wave Data Analysis with General Purpose Computing on Graphic Processing Units

Fully Quantum Modeling of Exciton Diffusion in Mesoscale Light Harvesting Systems

graphic processing units
Recently Published Documents