Implementation of algebraic procedures on the GPU using CUDA architecture on the example of generalized eigenvalue problem

Łukasz Syrocki; Grzegorz Pestka

doi:10.1515/comp-2016-0006

Implementation of algebraic procedures on the GPU using CUDA architecture on the example of generalized eigenvalue problem

Open Computer Science ◽

10.1515/comp-2016-0006 ◽

2016 ◽

Vol 6 (1) ◽

pp. 79-90

Author(s):

Łukasz Syrocki ◽

Grzegorz Pestka

Keyword(s):

Eigenvalue Problem ◽

Graphics Processing Unit ◽

Generalized Eigenvalue Problem ◽

Processing Unit ◽

Graphics Processors ◽

Central Processing ◽

Generalized Eigenvalue ◽

Cuda Technology ◽

Cuda Architecture ◽

High Level

AbstractThe ready to use set of functions to facilitate solving a generalized eigenvalue problem for symmetric matrices in order to efficiently calculate eigenvalues and eigenvectors, using Compute Unified Device Architecture (CUDA) technology from NVIDIA, is provided. An integral part of the CUDA is the high level programming environment enabling tracking both code executed on Central Processing Unit and on Graphics Processing Unit. The presented matrix structures allow for the analysis of the advantages of using graphics processors in such calculations.

Download Full-text

Analysis of Fast Fourier Transformations algorithm for CUDA Architecture

Lietuvos matematikos rinkinys ◽

10.15388/lmr.b.2012.46 ◽

2012 ◽

Vol 53 ◽

Author(s):

Beatričė Andziulienė ◽

Evaldas Žulkas ◽

Audrius Kuprinavičius

Keyword(s):

Graphics Processing Unit ◽

General Purpose ◽

Fast Fourier Transformation ◽

Processing Unit ◽

Data Allocation ◽

Analysis Method ◽

Central Processing ◽

Execution Speed ◽

Cuda Architecture ◽

Graphics Processing

In this work Fast Fourier transformation algorithm for general purpose graphics processing unit processing (GPGPU) is discussed. Algorithm structure and individual stages performance were analysed. With performance analysis method algorithm distribution and data allocation possibilities were determined, depending on algorithm stages execution speed and algorithm structure. Ratio between CPU and GPU execution during Fast Fourier transform signal processing was determined using computer-generated data with frequency. When adopting CPU code for CUDA execution, it not becomes more complex, even if stream procesor parallelization and data transfering algorith stages are considered. But central processing unit serial execution).

Download Full-text

High Performance GPU-Based Fourier Volume Rendering

International Journal of Biomedical Imaging ◽

10.1155/2015/590727 ◽

2015 ◽

Vol 2015 ◽

pp. 1-13 ◽

Cited By ~ 5

Author(s):

Marwan Abdellah ◽

Ayman Eldeib ◽

Amr Sharawi

Keyword(s):

Volume Rendering ◽

High Performance ◽

Graphics Processing Unit ◽

Rapid Evolution ◽

Processing Unit ◽

Central Processing ◽

Slice Theorem ◽

Cuda Technology ◽

Gpu Architectures ◽

3D Volume

Fourier volume rendering (FVR) is a significant visualization technique that has been used widely in digital radiography. As a result of itsO(N2log⁡N)time complexity, it provides a faster alternative to spatial domain volume rendering algorithms that areO(N3)computationally complex. Relying on theFourier projection-slice theorem, this technique operates on the spectral representation of a 3D volume instead of processing its spatial representation to generate attenuation-only projections that look likeX-ray radiographs. Due to the rapid evolution of its underlying architecture, the graphics processing unit (GPU) became an attractive competent platform that can deliver giant computational raw power compared to the central processing unit (CPU) on a per-dollar-basis. The introduction of the compute unified device architecture (CUDA) technology enables embarrassingly-parallel algorithms to run efficiently on CUDA-capable GPU architectures. In this work, a high performance GPU-accelerated implementation of the FVR pipeline on CUDA-enabled GPUs is presented. This proposed implementation can achieve a speed-up of 117x compared to a single-threaded hybrid implementation that uses the CPU and GPU together by taking advantage of executing the rendering pipeline entirely on recent GPU architectures.

Download Full-text

On the global convergence of the block Jacobi method for the positive definite generalized eigenvalue problem

CALCOLO ◽

10.1007/s10092-021-00415-8 ◽

2021 ◽

Vol 58 (2) ◽

Author(s):

Vjeran Hari

Keyword(s):

Global Convergence ◽

Eigenvalue Problem ◽

Positive Definite ◽

Generalized Eigenvalue Problem ◽

Jacobi Method ◽

Generalized Eigenvalue

Download Full-text

Numerical simulation of flattened heat pipe with double heat sources for CPU and GPU cooling application in laptop computers

Journal of Computational Design and Engineering ◽

10.1093/jcde/qwaa091 ◽

2020 ◽

Author(s):

Wisoot Sanhan ◽

Kambiz Vafai ◽

Niti Kammuang-Lue ◽

Pradit Terdtoon ◽

Phrut Sakulchangsatjatai

Keyword(s):

Experimental Data ◽

Heat Pipe ◽

Graphics Processing Unit ◽

Processing Unit ◽

Heat Sources ◽

Final Thickness ◽

Laptop Computers ◽

Central Processing ◽

Graphics Processing ◽

Good Agreement

Abstract An investigation of the effect of the thermal performance of the flattened heat pipe on its double heat sources acting as central processing unit and graphics processing unit in laptop computers is presented in this work. A finite element method is used for predicting the flattening effect of the heat pipe. The cylindrical heat pipe with a diameter of 6 mm and the total length of 200 mm is flattened into three final thicknesses of 2, 3, and 4 mm. The heat pipe is placed under a horizontal configuration and heated with heater 1 and heater 2, 40 W in combination. The numerical model shows good agreement compared with the experimental data with the standard deviation of 1.85%. The results also show that flattening the cylindrical heat pipe to 66.7 and 41.7% of its original diameter could reduce its normalized thermal resistance by 5.2%. The optimized final thickness or the best design final thickness for the heat pipe is found to be 2.5 mm.

Download Full-text

A Parallel-Computing Approach for Vector Road-Network Matching Using GPU Architecture

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi7120472 ◽

2018 ◽

Vol 7 (12) ◽

pp. 472 ◽

Cited By ~ 1

Author(s):

Bo Wan ◽

Lin Yang ◽

Shunping Zhou ◽

Run Wang ◽

Dezhi Wang ◽

...

Keyword(s):

Road Network ◽

Large Scale ◽

Graphics Processing Unit ◽

Road Networks ◽

Processing Unit ◽

Data Partition ◽

Matching Method ◽

The Road ◽

Central Processing ◽

Relaxation Matching

The road-network matching method is an effective tool for map integration, fusion, and update. Due to the complexity of road networks in the real world, matching methods often contain a series of complicated processes to identify homonymous roads and deal with their intricate relationship. However, traditional road-network matching algorithms, which are mainly central processing unit (CPU)-based approaches, may have performance bottleneck problems when facing big data. We developed a particle-swarm optimization (PSO)-based parallel road-network matching method on graphics-processing unit (GPU). Based on the characteristics of the two main stages (similarity computation and matching-relationship identification), data-partition and task-partition strategies were utilized, respectively, to fully use GPU threads. Experiments were conducted on datasets with 14 different scales. Results indicate that the parallel PSO-based matching algorithm (PSOM) could correctly identify most matching relationships with an average accuracy of 84.44%, which was at the same level as the accuracy of a benchmark—the probability-relaxation-matching (PRM) method. The PSOM approach significantly reduced the road-network matching time in dealing with large amounts of data in comparison with the PRM method. This paper provides a common parallel algorithm framework for road-network matching algorithms and contributes to integration and update of large-scale road-networks.

Download Full-text

A Cache-Aware Implementation of the Spectral Divide-and-Conquer Approach for the Non-Symmetric Generalized Eigenvalue Problem

PAMM ◽

10.1002/pamm.201410390 ◽

2014 ◽

Vol 14 (1) ◽

pp. 819-820 ◽

Cited By ~ 1

Author(s):

Peter Benner ◽

Martin Köhler ◽

Jens Saak

Keyword(s):

Eigenvalue Problem ◽

Generalized Eigenvalue Problem ◽

Divide And Conquer ◽

Generalized Eigenvalue

Download Full-text

Geometric Invariant Theory and Generalized Eigenvalue Problem II

Annales de l’institut Fourier ◽

10.5802/aif.2647 ◽

2011 ◽

Vol 61 (4) ◽

pp. 1467-1491 ◽

Cited By ~ 5

Author(s):

Nicolas Ressayre

Keyword(s):

Eigenvalue Problem ◽

Invariant Theory ◽

Geometric Invariant Theory ◽

Generalized Eigenvalue Problem ◽

Geometric Invariant ◽

Generalized Eigenvalue

Download Full-text

6. The Generalized Eigenvalue Problem

The Matrix Eigenvalue Problem ◽

10.1137/1.9780898717808.ch6 ◽

2007 ◽

pp. 233-263 ◽

Cited By ~ 2

Keyword(s):

Eigenvalue Problem ◽

Generalized Eigenvalue Problem ◽

Generalized Eigenvalue

Download Full-text

Iterative Calculations of a Few Lowest Eigenvalues and Corresponding Eigenvectors of Large Generalized Eigenvalue Problem

Acta Physico-Chimica Sinica ◽

10.3866/pku.whxb20080515 ◽

2008 ◽

Vol 24 (05) ◽

pp. 823-826

Author(s):

ZHAO Xiao-Hong ◽

◽

CHEN Fei-Wu ◽

WU Jian ◽

...

Keyword(s):

Eigenvalue Problem ◽

Generalized Eigenvalue Problem ◽

Generalized Eigenvalue

Download Full-text

Machine Learning in Python: Main Developments and Technology Trends in Data Science, Machine Learning, and Artificial Intelligence

Information ◽

10.3390/info11040193 ◽

2020 ◽

Vol 11 (4) ◽

pp. 193 ◽

Cited By ~ 7

Author(s):

Sebastian Raschka ◽

Joshua Patterson ◽

Corey Nolet

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Data Science ◽

Gpu Computing ◽

Graphics Processing Unit ◽

General Purpose ◽

Processing Unit ◽

The Core ◽

Critical Components ◽

High Level

Smarter applications are making better use of the insights gleaned from data, having an impact on every industry and research discipline. At the core of this revolution lies the tools and the methods that are driving it, from processing the massive piles of data generated each day to learning from and taking useful action. Deep neural networks, along with advancements in classical machine learning and scalable general-purpose graphics processing unit (GPU) computing, have become critical components of artificial intelligence, enabling many of these astounding breakthroughs and lowering the barrier to adoption. Python continues to be the most preferred language for scientific computing, data science, and machine learning, boosting both performance and productivity by enabling the use of low-level libraries and clean high-level APIs. This survey offers insight into the field of machine learning with Python, taking a tour through important topics to identify some of the core hardware and software paradigms that have enabled it. We cover widely-used libraries and concepts, collected together for holistic comparison, with the goal of educating the reader and driving the field of Python machine learning forward.

Download Full-text