LAPACK95 - HIGH PERFORMANCE LINEAR ALGEBRA PACKAGE

LAPACK95 is a set of FORTRAN95 subroutines which interfaces FORTRAN95 with LAPACK. All LAPACK driver subroutines (including expert drivers) and some LAPACK computationals have both generic LAPACK95 interfaces and generic LAPACK77 interfaces. The remaining computationals have only generic LAPACK77 interfaces. In both types of interfaces no distinction is made between single and double precision or between real and complex data types.

Download Full-text

A Set of Batched Basic Linear Algebra Subprograms and LAPACK Routines

ACM Transactions on Mathematical Software ◽

10.1145/3431921 ◽

2021 ◽

Vol 47 (3) ◽

pp. 1-23

Author(s):

Ahmad Abdelfattah ◽

Timothy Costa ◽

Jack Dongarra ◽

Mark Gates ◽

Azzam Haidar ◽

...

Keyword(s):

Machine Learning ◽

Linear Algebra ◽

High Performance ◽

Large Scale ◽

Floating Point ◽

Equal Size ◽

Hardware Accelerators ◽

Double Precision ◽

Basic Linear Algebra Subprograms ◽

Many Core

This article describes a standard API for a set of Batched Basic Linear Algebra Subprograms (Batched BLAS or BBLAS). The focus is on many independent BLAS operations on small matrices that are grouped together and processed by a single routine, called a Batched BLAS routine. The matrices are grouped together in uniformly sized groups, with just one group if all the matrices are of equal size. The aim is to provide more efficient, but portable, implementations of algorithms on high-performance many-core platforms. These include multicore and many-core CPU processors, GPUs and coprocessors, and other hardware accelerators with floating-point compute facility. As well as the standard types of single and double precision, we also include half and quadruple precision in the standard. In particular, half precision is used in many very large scale applications, such as those associated with machine learning.

Download Full-text

Software Analysis Tools for Wave Sensors

Volume 7: Ocean Engineering ◽

10.1115/omae2015-41852 ◽

2015 ◽

Author(s):

James Morrison ◽

David Christie ◽

Charles Greenwood ◽

Ruairi Maciver ◽

Arne Vogler

Keyword(s):

High Performance ◽

Time Series Data ◽

Software Tools ◽

Mixed Data ◽

Series Data ◽

Complex Data ◽

Data Types ◽

North West ◽

Software Analysis ◽

The North

This paper presents a set of software tools for interrogating and processing time series data. The functionality of this toolset will be demonstrated using data from a specific deployment involving multiple sensors deployed for a specific time period. The approach was developed initially for Datawell Waverider MKII/MKII buoys [1] and expanded to include data from acoustic devices in this case Nortek AWACs. Tools of this nature are important to address a specific lack of features in the sensor manufacturers own tools. It also helps to develop standard approaches for dealing with anomalous data from sensors. These software tools build upon an effective modern interpreted programming language in this case Python which has access to high performance low level libraries. This paper demonstrates the use of these tools applied to a sensor network based on the North West coast of Scotland as described in [2,3]. Examples can be seen of computationally complex data being easily calculated for monthly averages. Analysis down to a wave by wave basis will also be demonstrated form the same source dataset. The tools make use of a flexible data structure called a DataFrame which supports mixed data types, hierarchical and time indexing and is also integrated with modern plotting libraries. This allows sub second querying and the ability for dynamic plotting of large datasets. By using modern compression techniques and file formats it is possible to process datasets which are larger than memory datasets without the need for a traditional relational database. The software library shall be of use to a wide variety of industry involved in offshore engineering along with any scientists interested in the coastal environment.

Download Full-text

Adaptive Precision Block-Jacobi for High Performance Preconditioning in the Ginkgo Linear Algebra Software

ACM Transactions on Mathematical Software ◽

10.1145/3441850 ◽

2021 ◽

Vol 47 (2) ◽

pp. 1-28

Author(s):

Goran Flegar ◽

Hartwig Anzt ◽

Terry Cojean ◽

Enrique S. Quintana-Ortí

Keyword(s):

Linear Algebra ◽

Graphics Processing Units ◽

High Performance ◽

Numerical Algorithms ◽

Mixed Precision ◽

Before And After ◽

Memory Accesses ◽

Specialized Hardware ◽

The Individual ◽

Graphics Processing

The use of mixed precision in numerical algorithms is a promising strategy for accelerating scientific applications. In particular, the adoption of specialized hardware and data formats for low-precision arithmetic in high-end GPUs (graphics processing units) has motivated numerous efforts aiming at carefully reducing the working precision in order to speed up the computations. For algorithms whose performance is bound by the memory bandwidth, the idea of compressing its data before (and after) memory accesses has received considerable attention. One idea is to store an approximate operator–like a preconditioner–in lower than working precision hopefully without impacting the algorithm output. We realize the first high-performance implementation of an adaptive precision block-Jacobi preconditioner which selects the precision format used to store the preconditioner data on-the-fly, taking into account the numerical properties of the individual preconditioner blocks. We implement the adaptive block-Jacobi preconditioner as production-ready functionality in the Ginkgo linear algebra library, considering not only the precision formats that are part of the IEEE standard, but also customized formats which optimize the length of the exponent and significand to the characteristics of the preconditioner blocks. Experiments run on a state-of-the-art GPU accelerator show that our implementation offers attractive runtime savings.

Download Full-text

A Novel Rounding Algorithm for a High Performance IEEE 754 Double-Precision Floating-Point Multiplier

2020 IEEE 38th International Conference on Computer Design (ICCD) ◽

10.1109/iccd50377.2020.00081 ◽

2020 ◽

Author(s):

S. Ross Thompson ◽

James E. Stine

Keyword(s):

High Performance ◽

Floating Point ◽

Double Precision ◽

Rounding Algorithm

Download Full-text

Profiling high performance dense linear algebra algorithms on multicore architectures for power and energy efficiency

Computer Science - Research and Development ◽

10.1007/s00450-011-0191-z ◽

2011 ◽

Vol 27 (4) ◽

pp. 277-287 ◽

Cited By ~ 17

Author(s):

Hatem Ltaief ◽

Piotr Luszczek ◽

Jack Dongarra

Keyword(s):

Energy Efficiency ◽

Linear Algebra ◽

High Performance ◽

Multicore Architectures ◽

Dense Linear Algebra ◽

Power And Energy

Download Full-text

Binding Complex Data Types Part I

Expert ASP.NET Web API 2 for MVC Developers ◽

10.1007/978-1-4842-0085-8_16 ◽

2014 ◽

pp. 351-387

Author(s):

Adam Freeman

Keyword(s):

Complex Data ◽

Data Types

Download Full-text

A survey of power and energy efficient techniques for high performance numerical linear algebra operations

Parallel Computing ◽

10.1016/j.parco.2014.09.001 ◽

2014 ◽

Vol 40 (10) ◽

pp. 559-573 ◽

Cited By ~ 19

Author(s):

Li Tan ◽

Shashank Kothapalli ◽

Longxiang Chen ◽

Omar Hussaini ◽

Ryan Bissiri ◽

...

Keyword(s):

Linear Algebra ◽

Energy Efficient ◽

High Performance ◽

Numerical Linear Algebra ◽

Power And Energy

Download Full-text

EPICS Version 4 - Implementing Complex Data Types

10.2172/1055720 ◽

2012 ◽

Author(s):

Marty Kraimer, ◽

John dalesio

Keyword(s):

Complex Data ◽

Data Types

Download Full-text

NUMA-Aware DGEMM Based on 64-Bit ARMv8 Multicore Processors Architecture

Electronics ◽

10.3390/electronics10161984 ◽

2021 ◽

Vol 10 (16) ◽

pp. 1984

Author(s):

Wei Zhang ◽

Zihao Jiang ◽

Zhiguang Chen ◽

Nong Xiao ◽

Yang Ou

Keyword(s):

Energy Efficiency ◽

High Performance ◽

Multicore Processors ◽

Matrix Multiplication ◽

Memory Access ◽

Double Precision ◽

Competitive Performance ◽

General Matrix ◽

Remarkable Improvement ◽

Task Independence

Double-precision general matrix multiplication (DGEMM) is an essential kernel for measuring the potential performance of an HPC platform. ARMv8-based system-on-chips (SoCs) have become the candidates for the next-generation HPC systems with their highly competitive performance and energy efficiency. Therefore, it is meaningful to design high-performance DGEMM for ARMv8-based SoCs. However, as ARMv8-based SoCs integrate increasing cores, modern CPU uses non-uniform memory access (NUMA). NUMA restricts the performance and scalability of DGEMM when many threads access remote NUMA domains. This poses a challenge to develop high-performance DGEMM on multi-NUMA architecture. We present a NUMA-aware method to reduce the number of cross-die and cross-chip memory access events. The critical enabler for NUMA-aware DGEMM is to leverage two levels of parallelism between and within nodes in a purely threaded implementation, which allows the task independence and data localization of NUMA nodes. We have implemented NUMA-aware DGEMM in the OpenBLAS and evaluated it on a dual-socket server with 48-core processors based on the Kunpeng920 architecture. The results show that NUMA-aware DGEMM has effectively reduced the number of cross-die and cross-chip memory access, resulting in enhancing the scalability of DGEMM significantly and increasing the performance of DGEMM by 17.1% on average, with the most remarkable improvement being 21.9%.

Download Full-text

Tree-Shaped Formats of Address Programming Language

NaUKMA Research Papers Computer Science ◽

10.18523/2617-3808.2021.4.78-87 ◽

2021 ◽

Vol 4 ◽

pp. 78-87

Author(s):

Yury Yuschenko

Keyword(s):

Programming Languages ◽

Programming Language ◽

Theoretical Research ◽

Direct Access ◽

Complex Data ◽

Abstract Data Types ◽

Data Types ◽

Abstract Data ◽

The One ◽

High Level

In the Address Programming Language (1955), the concept of indirect addressing of higher ranks (Pointers) was introduced, which allows the arbitrary connection of the computer’s RAM cells. This connection is based on standard sequences of the cell addresses in RAM and addressing sequences, which is determined by the programmer with indirect addressing. Two types of sequences allow programmers to determine an arbitrary connection of RAM cells with the arbitrary content: data, addresses, subroutines, program labels, etc. Therefore, the formed connections of cells can relate to each other. The result of connecting cells with the arbitrary content and any structure is called tree-shaped formats. Tree-shaped formats allow programmers to combine data into complex data structures that are like abstract data types. For tree-shaped formats, the concept of “review scheme” is defined, which is like the concept of “bypassing” trees. Programmers can define multiple overview diagrams for the one tree-shaped format. Programmers can create tree-shaped formats over the connected cells to define the desired overview schemes for these connected cells. The work gives a modern interpretation of the concept of tree-shaped formats in Address Programming. Tree-shaped formats are based on “stroke-operation” (pointer dereference), which was hardware implemented in the command system of computer “Kyiv”. Group operations of modernization of computer “Kyiv” addresses accelerate the processing of tree-shaped formats and are designed as organized cycles, like those in high-level imperative programming languages. The commands of computer “Kyiv”, due to operations with indirect addressing, have more capabilities than the first high-level programming language – Plankalkül. Machine commands of the computer “Kyiv” allow direct access to the i-th element of the “list” by its serial number in the same way as such access is obtained to the i-th element of the array by its index. Given examples of singly linked lists show the features of tree-shaped formats and their differences from abstract data types. The article opens a new branch of theoretical research, the purpose of which is to analyze the expe- diency of partial inclusion of Address Programming in modern programming languages.

Download Full-text