High-Performance Interactive Scientific Visualization With Datoviz via the Vulkan Low-Level GPU API

AbstractParallel implementations of swarm intelligence algorithms such as the ant colony optimization (ACO) have been widely used to shorten the execution time when solving complex optimization problems. When aiming for a GPU environment, developing efficient parallel versions of such algorithms using CUDA can be a difficult and error-prone task even for experienced programmers. To overcome this issue, the parallel programming model of Algorithmic Skeletons simplifies parallel programs by abstracting from low-level features. This is realized by defining common programming patterns (e.g. map, fold and zip) that later on will be converted to efficient parallel code. In this paper, we show how algorithmic skeletons formulated in the domain specific language Musket can cope with the development of a parallel implementation of ACO and how that compares to a low-level implementation. Our experimental results show that Musket suits the development of ACO. Besides making it easier for the programmer to deal with the parallelization aspects, Musket generates high performance code with similar execution times when compared to low-level implementations.

Download Full-text

High performance and low level of image sticking fringe field switching mode liquid crystal display with photo-alignment layers

Molecular Crystals and Liquid Crystals ◽

10.1080/15421406.2020.1716151 ◽

2019 ◽

Vol 692 (1) ◽

pp. 25-33

Author(s):

Masanobu Mizusaki

Keyword(s):

Liquid Crystal ◽

High Performance ◽

Liquid Crystal Display ◽

Fringe Field ◽

Low Level ◽

Photo Alignment ◽

Switching Mode ◽

Alignment Layers

Download Full-text

10.5 - Low Level Flight Monitoring - High performance graphics

10.5162/ettc2018/10.5 ◽

2018 ◽

Author(s):

A. Rodrigo López Parra

Keyword(s):

High Performance ◽

Low Level ◽

Level Flight

Download Full-text

Polyaromatic hydrocarbons as high-performance liquid chromatographic calibration standards for the low level determination of chlorinated dibenzo-p-dioxins and chlorinated dibenzufurans in biological samples

Journal of Chromatography A ◽

10.1016/s0021-9673(00)85050-5 ◽

1982 ◽

Vol 248 (3) ◽

pp. 409-415 ◽

Cited By ~ 5

Author(s):

John.J. Ryan ◽

Jean C. Pilon

Keyword(s):

Biological Samples ◽

High Performance ◽

Polyaromatic Hydrocarbons ◽

High Performance Liquid Chromatographic ◽

Calibration Standards ◽

Low Level ◽

Liquid Chromatographic

Download Full-text

Low-Level (PPB)Determination of Cisplatin in Cleaning Validation (Rinse Water) Samples. II. A High-Performance Liquid Chromatogrphic Method

Drug Development and Industrial Pharmacy ◽

10.1081/ddc-100101250 ◽

2000 ◽

Vol 26 (4) ◽

pp. 429-440 ◽

Cited By ~ 18

Author(s):

Rajagopalan Raghavan ◽

Mark Burchett ◽

David Loffredo ◽

Jo Anne Mulligan

Keyword(s):

Water Samples ◽

High Performance ◽

Low Level ◽

Cleaning Validation ◽

Rinse Water

Download Full-text

VLIW DSP-Based Low-Level Instruction Scheme of Givens QR Decomposition for Real-Time Processing

Journal of Circuits System and Computers ◽

10.1142/s0218126617501298 ◽

2017 ◽

Vol 26 (09) ◽

pp. 1750129 ◽

Cited By ~ 2

Author(s):

Mohamed Najoui ◽

Mounir Bahtat ◽

Anas Hatim ◽

Said Belkouch ◽

Noureddine Chabini

Keyword(s):

High Performance ◽

Qr Decomposition ◽

Numerical Linear Algebra ◽

Instruction Level Parallelism ◽

Management Approach ◽

Real Time Processing ◽

Low Level ◽

Processor Architectures ◽

Efficient Data ◽

Level Parallelism

QR decomposition (QRD) is one of the most widely used numerical linear algebra (NLA) kernels in several signal processing applications. Its implementation has a considerable and an important impact on the system performance. As processor architectures continue to gain ground in the high-performance computing world, QRD algorithms have to be redesigned in order to take advantage of the architectural features on these new processors. However, in some processor architectures like very large instruction word (VLIW), compiler efficiency is not enough to make an effective use of available computational resources. This paper presents an efficient and optimized approach to implement Givens QRD in a low-power platform based on VLIW architecture. To overcome the compiler efficiency limits to parallelize the most of Givens arithmetic operations, we propose a low-level instruction scheme that could maximize the parallelism rate and minimize clock cycles. The key contributions of this work are as follows: (i) New parallel and fast version design of Givens algorithm based on the VLIW features (i.e., instruction-level parallelism (ILP) and data-level parallelism (DLP)) including the cache memory properties. (ii) Efficient data management approach to avoid cache misses and memory bank conflicts. Two DSP platforms C6678 and AK2H12 were used as targets for implementation. The introduced parallel QR implementation method achieves, in average, more than 12[Formula: see text] and 6[Formula: see text] speedups over the standard algorithm version and the optimized QR routine implementations, respectively. Compared to the state of the art, the proposed scheme implementation is at least 3.65 and 2.5 times faster than the recent CPU and DSP implementations, respectively.

Download Full-text

A framework for high-performance matrix multiplication based on hierarchical abstractions, algorithms and optimized low-level kernels

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.630 ◽

2002 ◽

Vol 14 (10) ◽

pp. 805-839 ◽

Cited By ~ 21

Author(s):

Vinod Valsalam ◽

Anthony Skjellum

Keyword(s):

High Performance ◽

Matrix Multiplication ◽

Low Level ◽

Performance Matrix

Download Full-text

AN ASSOCIATIVE DATA PARALLEL COMPILATION MODEL FOR TIGHT INTEGRATION OF HIGH PERFORMANCE KNOWLEDGE RETRIEVAL AND COMPUTING

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213094000078 ◽

1994 ◽

Vol 03 (01) ◽

pp. 97-125 ◽

Cited By ~ 3

Author(s):

ARVIND K. BANSAL

Keyword(s):

Performance Evaluation ◽

High Performance ◽

Loose Coupling ◽

Abstract Machine ◽

Data Movement ◽

Left Hand ◽

Low Level ◽

Data Parallel ◽

Data Alignment ◽

And Performance

Associative Computation is characterized by intertwining of search by content and data parallel computation. An algebra for associative computation is described. A compilation based model and a novel abstract machine for associative logic programming are presented. The model uses loose coupling of left hand side of the program, treated as data, and right hand side of the program, treated as low level code. This representation achieves efficiency by associative computation and data alignment during goal reduction and during execution of low level abstract instructions. Data alignment reduces the overhead of data movement. Novel schemes for associative manipulation of aliased uninstantiated variables, data parallel goal reduction in the presence multiple occurrences of the same variables in a goal. The architecture, behavior, and performance evaluation of the model are presented.

Download Full-text