ALGORITHMIC OPTIMIZATION OF SOFTWARE IMPLEMENTATION OF ALGORITHMS FOR MULTIPLYING DENSE REAL MATRICES ON GRAPHICS PROCESSORS WITH OPENGL TECHNOLOGY SUPPORT

In the article was given statement of a problem of matrix multiplication. Is is show that desired problem can be simpl formulated but for its solving may be required both heuristic methods and set of algorithmic modifications relating to algorithmic and high-level software optimization taking into account the particular problem and allow to increase the multiplication performance. These include: a comparative analysis of the performance of the actions performed without GPU-specific optimizations and with optimizations, which showed that computations without optimizing the work with global GPU memory have low processing performance. Optimizing data distribution in global and local memory The GPU allows you to reuse the calculation time and increase real performance. To compare the performance of the developed software implementations for OpenGL and CUDA technologies, identical calculations on identical GPUs were performed, which showed higher real performance when using CUDA cores. Specific values of generation performance measured for multi-threaded software implementation on GPU are given for all of described optimizations. It is shown that the most effective approach is based on the method we can get much more performance by technique of caching sub-blocks of the matrices (tiles) in the GPU's on-chip local memory, that with specialized software implementation is provide the performance of 275,3 GFLOP/s for GPU GeForce GTX 960M.

Download Full-text

Integrated Magnetohydrodynamic Pump with Magnetic Composite Substrate and Laser-Induced Graphene Electrodes

Polymers ◽

10.3390/polym13071113 ◽

2021 ◽

Vol 13 (7) ◽

pp. 1113

Author(s):

Mohammed Asadullah Khan ◽

Jürgen Kosel

Keyword(s):

Magnetic Flux ◽

Flow Rate ◽

Magnetic Composite ◽

Propulsive Force ◽

Closed Channel ◽

Lab On Chip ◽

Composite Substrate ◽

On Chip ◽

High Level ◽

Moving Parts

An integrated polymer-based magnetohydrodynamic (MHD) pump that can actuate saline fluids in closed-channel devices is presented. MHD pumps are attractive for lab-on-chip applications, due to their ability to provide high propulsive force without any moving parts. Unlike other MHD devices, a high level of integration is demonstrated by incorporating both laser-induced graphene (LIG) electrodes as well as a NdFeB magnetic-flux source in the NdFeB-polydimethylsiloxane permanent magnetic composite substrate. The effects of transferring the LIG film from polyimide to the magnetic composite substrate were studied. Operation of the integrated magneto hydrodynamic pump without disruptive bubbles was achieved. In the studied case, the pump produces a flow rate of 28.1 µL/min. while consuming ~1 mW power.

Download Full-text

From high-level specifications down to software implementations of parallel embedded real-time systems

Proceedings Design, Automation and Test in Europe Conference and Exhibition 2000 (Cat. No. PR00537) ◽

10.1109/date.2000.840861 ◽

2002 ◽

Author(s):

C. Rust ◽

F. Stappert ◽

P. Altenbernd ◽

J. Tacken

Keyword(s):

Real Time ◽

Real Time Systems ◽

Software Implementations ◽

High Level ◽

Time Systems

Download Full-text

Cooperative Mechanism of Local Memory and Cache in Network Processors

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.380-384.1969 ◽

2013 ◽

Vol 380-384 ◽

pp. 1969-1972

Author(s):

Bo Yuan ◽

Jin Dou Fan ◽

Bin Liu

Keyword(s):

Packet Processing ◽

Network Processors ◽

Local Memory ◽

Memory Space ◽

Memory Mechanism ◽

Hierarchical Memory ◽

Packet Data ◽

On Chip ◽

Cache Mechanism ◽

Cache Miss

Traditional network processors (NPs) adopt either local memory mechanism or cache mechanism as the hierarchical memory structure. The local memory mechanism usually has small on-chip memory space which is not fit for the various complicated applications. The cache mechanism is better at dealing with the temporary data which need to be read and written frequently. But in deep packet processing, cache miss occurs when reading each segment of packet. We propose a cooperative mechanism of local memory and cache. In which the packet data and temporary data are stored into local memory and cache respectively. The analysis and experimental evaluation shows that the cooperative mechanism can improve the performance of network processors and reduce processing latency with little extra resources cost.

Download Full-text

High-level Estimation and Exploration of Reliability for Multi-Processor System-on-Chip

10.1007/978-981-10-1073-6 ◽

2018 ◽

Author(s):

Zheng Wang ◽

Anupam Chattopadhyay

Keyword(s):

System On Chip ◽

High Level Estimation ◽

On Chip ◽

High Level

Download Full-text

High Level System-on-Chip Design using UML and SystemC

Electronics, Robotics and Automotive Mechanics Conference (CERMA 2007) ◽

10.1109/cerma.2007.4367776 ◽

2007 ◽

Cited By ~ 1

Author(s):

Blanca Alicia Correa ◽

Juan Fernando Eusse ◽

Danny Munera ◽

Jose Edinson Aedo ◽

Juan Fernando Velez

Keyword(s):

System On Chip ◽

Level System ◽

Chip Design ◽

On Chip ◽

High Level

Download Full-text

DIVERSITY AES IN MIXCOLUMNS STEP WITH 8X8 CIRCULANT MATRIX

International Journal of Engineering Technologies and Management Research ◽

10.29121/ijetmr.v8.i9.2021.1037 ◽

2021 ◽

Vol 8 (9) ◽

pp. 19-35

Author(s):

Yan-Wen Chen ◽

Jeng-Jung Wang ◽

Yan-Haw Chen ◽

Chong-Dao Lee

Keyword(s):

Embedded Systems ◽

Efficient Method ◽

Matrix Multiplication ◽

Circulant Matrix ◽

Circulant Matrices ◽

Matrix Operation ◽

Branch Number ◽

Involutory Matrix ◽

Software Implementations ◽

Encryption Decryption

In AES MixColumns operation, the branch number of circulant matrix is raised from 5 to 9 with 8´8 circulant matrices that can be enhancing the diffusion power. An efficient method to compute the circulant matrices in AES MixColumns transformation for speeding encryption is presented. Utilizing 8´8 involutory matrix multiplication is required 64 multiplications and 56 additions in in AES Mix-Columns transformation. We proposed the method with diversity 8´8 circulant matrices is only needed 19 multiplications and 57 additions. It is not only to encryption operations but also to decryption operations. Therefore, 8´8 circlant matrix operation with AES key sizes of 128bits, 192bits, and 256 bits are above 29.1%, 29.3%, and 29.8% faster than using 4´4 involutory matrix operation (16 multiplications, 12 additions), respectively. 8´8 circulant matrix encryption/decryption speed is above 78% faster than 8´8 involutory matrix operation. Ultimately, the proposed method for evaluating matrix multiplication can be made regular, simple and suitable for software implementations on embedded systems.

Download Full-text

Molecular Fingerprints of Hemoglobin on a Nanofilm Chip

Sensors ◽

10.3390/s18093016 ◽

2018 ◽

Vol 18 (9) ◽

pp. 3016 ◽

Cited By ~ 20

Author(s):

Yeşeren Saylan ◽

Adil Denizli

Keyword(s):

Real Time ◽

Theoretical Calculations ◽

Essential Element ◽

Hemoglobin Concentration ◽

Ammonium Persulfate ◽

Detection Methods ◽

Molecular Fingerprints ◽

Molecularly Imprinted ◽

On Chip ◽

High Level

Hemoglobin is an iron carrying protein in erythrocytes and also an essential element to transfer oxygen from the lungs to the tissues. Abnormalities in hemoglobin concentration are closely correlated with health status and many diseases, including thalassemia, anemia, leukemia, heart disease, and excessive loss of blood. Particularly in resource-constrained settings existing blood analyzers are not readily applicable due to the need for high-level instrumentation and skilled personnel, thereby inexpensive, easy-to-use, and reliable detection methods are needed. Herein, a molecular fingerprints of hemoglobin on a nanofilm chip was obtained for real-time, sensitive, and selective hemoglobin detection using a surface plasmon resonance system. Briefly, through the photopolymerization technique, a template (hemoglobin) was imprinted on a monomeric (acrylamide) nanofilm on-chip using a cross-linker (methylenebisacrylamide) and an initiator-activator pair (ammonium persulfate-tetramethylethylenediamine). The molecularly imprinted nanofilm on-chip was characterized by atomic force microscopy and ellipsometry, followed by benchmarking detection performance of hemoglobin concentrations from 0.0005 mg mL−1 to 1.0 mg mL−1. Theoretical calculations and real-time detection implied that the molecularly imprinted nanofilm on-chip was able to detect as little as 0.00035 mg mL−1 of hemoglobin. In addition, the experimental results of hemoglobin detection on the chip well-fitted with the Langmuir adsorption isotherm model with high correlation coefficient (0.99) and association and dissociation coefficients (39.1 mL mg−1 and 0.03 mg mL−1) suggesting a monolayer binding characteristic. Assessments on selectivity, reusability and storage stability indicated that the presented chip is an alternative approach to current hemoglobin-targeted assays in low-resource regions, as well as antibody-based detection procedures in the field. In the future, this molecularly imprinted nanofilm on-chip can easily be integrated with portable plasmonic detectors, improving its access to these regions, as well as it can be tailored to detect other proteins and biomarkers.

Download Full-text

Design Techniques for Microfluidic Devices Implementation Applicable to Chemical Analysis Systems

Process Analysis, Design, and Intensification in Microfluidics and Chemical Engineering - Advances in Chemical and Materials Engineering ◽

10.4018/978-1-5225-7138-4.ch007 ◽

2019 ◽

pp. 195-222 ◽

Cited By ~ 1

Author(s):

Reinaldo Lucas dos Santos Rosa ◽

Antonio Carlos Seabra

Keyword(s):

Chemical Analysis ◽

Solid Phase ◽

Microfluidic Devices ◽

Water Quality Monitoring ◽

Analysis Process ◽

Micro Total Analysis Systems ◽

Total Analysis ◽

On Chip ◽

Separation Cell ◽

High Level

This chapter provides a guide for microfluidic devices development and optimization focused on chemical analysis applications, which includes medicine, biology, chemistry, and environmental monitoring, showing high-level performance associated with a specific functionality. Examples are chemical analysis, solid phase extraction, chromatography, immunoassay analysis, protein and DNA separation, cell sorting and manipulation, cellular biology, and mass spectrometry. In this chapter, most information is related to microfluidic devices design and fabrication used to perform several steps concerning chemical analysis, process preparation of reagents, samples reaction and detection, regarding water quality monitoring. These steps are especially relevant to lab-on-chip (LOC) and micro-total-analysis-systems (μTAS). μTAS devices are developed in order to simplify analytical chemist work, incorporating several analytical procedures into flow systems. In the case of miniaturized devices, the analysis time is reduced, and small volumes (nL) can be used.

Download Full-text

Communication-centric high level synthesis metrics for low vertical channel density 3-dimensional Networks-on-Chip

7th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC) ◽

10.1109/recosoc.2012.6322897 ◽

2012 ◽

Cited By ~ 2

Author(s):

Haoyuan Ying ◽

Thomas Hollstein ◽

Klaus Hofmann

Keyword(s):

Vertical Channel ◽

High Level Synthesis ◽

Networks On Chip ◽

3 Dimensional ◽

Channel Density ◽

On Chip ◽

High Level

Download Full-text

Implementation of algebraic procedures on the GPU using CUDA architecture on the example of generalized eigenvalue problem

Open Computer Science ◽

10.1515/comp-2016-0006 ◽

2016 ◽

Vol 6 (1) ◽

pp. 79-90

Author(s):

Łukasz Syrocki ◽

Grzegorz Pestka

Keyword(s):

Eigenvalue Problem ◽

Graphics Processing Unit ◽

Generalized Eigenvalue Problem ◽

Processing Unit ◽

Graphics Processors ◽

Central Processing ◽

Generalized Eigenvalue ◽

Cuda Technology ◽

Cuda Architecture ◽

High Level

AbstractThe ready to use set of functions to facilitate solving a generalized eigenvalue problem for symmetric matrices in order to efficiently calculate eigenvalues and eigenvectors, using Compute Unified Device Architecture (CUDA) technology from NVIDIA, is provided. An integral part of the CUDA is the high level programming environment enabling tracking both code executed on Central Processing Unit and on Graphics Processing Unit. The presented matrix structures allow for the analysis of the advantages of using graphics processors in such calculations.

Download Full-text