Multiple-Precision Residue-Based Arithmetic Library for Parallel CPU-GPU Architectures: Data Types and Features

Multiple-Precision BLAS Library for Graphics Processing Units

10.36227/techrxiv.12580301.v1 ◽

2020 ◽

Author(s):

Konstantin Isupov ◽

Vladimir Knyazkov

Keyword(s):

Graphics Processing Units ◽

Arithmetic Operation ◽

Number System ◽

Residue Number System ◽

Floating Point ◽

Data Types ◽

Rounding Errors ◽

Multiple Precision ◽

Graphics Processing ◽

Point Arithmetic

The binary32 and binary64 floating-point formats provide good performance on current hardware, but also introduce a rounding error in almost every arithmetic operation. Consequently, the accumulation of rounding errors in large computations can cause accuracy issues. One way to prevent these issues is to use multiple-precision floating-point arithmetic. This preprint, submitted to Russian Supercomputing Days 2020, presents a new library of basic linear algebra operations with multiple precision for graphics processing units. The library is written in CUDA C/C++ and uses the residue number system to represent multiple-precision significands of floating-point numbers. The supported data types, memory layout, and main features of the library are considered. Experimental results are presented showing the performance of the library.

Download Full-text

A method for decompilation of AMD GCN kernels to OpenCL

Information and Control Systems ◽

10.31799/1684-8853-2021-2-33-42 ◽

2021 ◽

pp. 33-42

Author(s):

Kristina Mihajlenko ◽

Mikhail Lukin ◽

Andrey Stankevich

Keyword(s):

Programming Languages ◽

Source Code ◽

Practical Relevance ◽

Data Types ◽

Arithmetic Operations ◽

Assembly Code ◽

Software Analysis ◽

Hardware Architectures ◽

Gpu Architectures ◽

Gpu Architecture

Introduction: Decompilers are useful tools for software analysis and support in the absence of source code. They are available for many hardware architectures and programming languages. However, none of the existing decompilers support modern AMD GPU architectures such as AMD GCN and RDNA. Purpose: We aim at developing the first assembly decompiler tool for a modern AMD GPU architecture that generates code in the OpenCL language, which is widely used for programming GPGPUs. Results: We developed the algorithms for the following operations: preprocessing assembly code, searching data accesses, extracting systemvalues, decompiling arithmetic operations and recovering data types. We also developed templates for decompilation of branching operations. Practical relevance: We implemented the presented algorithms in Python as a tool called OpenCLDecompiler, which supports a large subset of AMD GCN instructions. This tool automatically converts disassembled GPGPU code into the equivalent OpenCL code, which reduces the effort required to analyze assembly code.

Download Full-text

Performance evaluation in the reconstruction of 2D images of computed tomography using massively parallel programming CUDA

10.21203/rs.3.rs-863369/v1 ◽

2021 ◽

Author(s):

Alexssandro Ferreira Cordeiro ◽

Pedro Luiz de Paula Filho ◽

Hamilton Pereira Silva ◽

Arnaldo Candido Junior ◽

Edresson Casanova ◽

...

Keyword(s):

Parallel Programming ◽

Processing Time ◽

Data Type ◽

Massively Parallel ◽

Data Types ◽

Sequential Approach ◽

Time Performance ◽

Sequential Programming ◽

Gpu Architectures ◽

2D Images

Abstract Purpose: analysis of processing time and similarity of images generated between CPU and GPU architectures and sequential and parallel programming methodologies. Material and methods: for image processing a computer with AMD FX-8350 processor and an Nvidia GTX 960 Maxwell GPU was used, along with the CUDAFY library and the programming language C# with the IDE Visual studio. Results: the results of the comparisons indicate that the form of sequential programming in a CPU generates reliable images at a high custom of time when compared to the forms of parallel programming in CPU and GPU. While parallel programming generates faster results, but with increased noise in the reconstructed image. For data types float a GPU obtained best result with average time equivalent to 1/3 of the processor, however the data is of type double the parallel CPU approach obtained the best performance. Conclusion: for the float data type, the GPU had the best average time performance, while for the double data type the best average time performance was for the parallel approach CPU. Regarding image quality, the sequential approach obtained similar outputs, while theparallel approaches generated noise in their outputs.

Download Full-text

Multiple-Precision BLAS Library for Graphics Processing Units

10.36227/techrxiv.12580301 ◽

2020 ◽

Author(s):

Konstantin Isupov ◽

Vladimir Knyazkov

Keyword(s):

Graphics Processing Units ◽

Arithmetic Operation ◽

Number System ◽

Residue Number System ◽

Floating Point ◽

Data Types ◽

Rounding Errors ◽

Multiple Precision ◽

Graphics Processing ◽

Point Arithmetic

The binary32 and binary64 floating-point formats provide good performance on current hardware, but also introduce a rounding error in almost every arithmetic operation. Consequently, the accumulation of rounding errors in large computations can cause accuracy issues. One way to prevent these issues is to use multiple-precision floating-point arithmetic. This preprint, submitted to Russian Supercomputing Days 2020, presents a new library of basic linear algebra operations with multiple precision for graphics processing units. The library is written in CUDA C/C++ and uses the residue number system to represent multiple-precision significands of floating-point numbers. The supported data types, memory layout, and main features of the library are considered. Experimental results are presented showing the performance of the library.

Download Full-text

An Octree-based, Cartesian CFD Solver for Helios on CPU and GPU Architectures.

AIAA Scitech 2021 Forum ◽

10.2514/6.2021-0841 ◽

2021 ◽

Author(s):

Dylan Jude ◽

Jay Sitaraman ◽

Andrew M. Wissink

Keyword(s):

Gpu Architectures

Download Full-text

Integrating Visual and Bayesian Statistical Analyses in Single Case Experimental Research to Evaluate the Effectiveness and Magnitude of a Comprehensive Behavioral Intervention

10.31234/osf.io/54cj3 ◽

2018 ◽

Author(s):

Prathiba Natesan ◽

Smita Mehta

Keyword(s):

Behavioral Intervention ◽

Effect Size ◽

Rate Ratio ◽

Visual Analysis ◽

Single Case ◽

Small Sample ◽

Statistical Analyses ◽

Data Types ◽

Ratio Effect ◽

Small Sample Sizes

Single case experimental designs (SCEDs) have become an indispensable methodology where randomized control trials may be impossible or even inappropriate. However, the nature of SCED data presents challenges for both visual and statistical analyses. Small sample sizes, autocorrelations, data types, and design types render many parametric statistical analyses and maximum likelihood approaches ineffective. The presence of autocorrelation decreases interrater reliability in visual analysis. The purpose of the present study is to demonstrate a newly developed model called the Bayesian unknown change-point (BUCP) model which overcomes all the above-mentioned data analytic challenges. This is the first study to formulate and demonstrate rate ratio effect size for autocorrelated data, which has remained an open question in SCED research until now. This expository study also compares and contrasts the results from BUCP model with visual analysis, and rate ratio effect size with nonoverlap of all pairs (NAP) effect size. Data from a comprehensive behavioral intervention are used for the demonstration.

Download Full-text