scholarly journals ALGORITHMIC OPTIMIZATION OF SOFTWARE IMPLEMENTATION OF ALGORITHMS FOR MULTIPLYING DENSE REAL MATRICES ON GRAPHICS PROCESSORS WITH OPENGL TECHNOLOGY SUPPORT

2017 ◽  
Vol 21 (5) ◽  
pp. 6-15
Author(s):  
Y. A. Zatolokin ◽  
E. I. Vatutin ◽  
V. S. Titov

In the article was given statement of a problem of matrix multiplication. Is is show that desired problem can be simpl formulated but for its solving may be required both heuristic methods and set of algorithmic modifications relating to algorithmic and high-level software optimization taking into account the particular problem and allow to increase the multiplication performance. These include: a comparative analysis of the performance of the actions performed without GPU-specific optimizations and with optimizations, which showed that computations without optimizing the work with global GPU memory have low processing performance. Optimizing data distribution in global and local memory The GPU allows you to reuse the calculation time and increase real performance. To compare the performance of the developed software implementations for OpenGL and CUDA technologies, identical calculations on identical GPUs were performed, which showed higher real performance when using CUDA cores. Specific values of generation performance measured for multi-threaded software implementation on GPU are given for all of described optimizations. It is shown that the most effective approach is based on the method we can get much more performance by technique of caching sub-blocks of the matrices (tiles) in the GPU's on-chip local memory, that with specialized software implementation is provide the performance of 275,3 GFLOP/s for GPU GeForce GTX 960M.

Polymers ◽  
2021 ◽  
Vol 13 (7) ◽  
pp. 1113
Author(s):  
Mohammed Asadullah Khan ◽  
Jürgen Kosel

An integrated polymer-based magnetohydrodynamic (MHD) pump that can actuate saline fluids in closed-channel devices is presented. MHD pumps are attractive for lab-on-chip applications, due to their ability to provide high propulsive force without any moving parts. Unlike other MHD devices, a high level of integration is demonstrated by incorporating both laser-induced graphene (LIG) electrodes as well as a NdFeB magnetic-flux source in the NdFeB-polydimethylsiloxane permanent magnetic composite substrate. The effects of transferring the LIG film from polyimide to the magnetic composite substrate were studied. Operation of the integrated magneto hydrodynamic pump without disruptive bubbles was achieved. In the studied case, the pump produces a flow rate of 28.1 µL/min. while consuming ~1 mW power.


2013 ◽  
Vol 380-384 ◽  
pp. 1969-1972
Author(s):  
Bo Yuan ◽  
Jin Dou Fan ◽  
Bin Liu

Traditional network processors (NPs) adopt either local memory mechanism or cache mechanism as the hierarchical memory structure. The local memory mechanism usually has small on-chip memory space which is not fit for the various complicated applications. The cache mechanism is better at dealing with the temporary data which need to be read and written frequently. But in deep packet processing, cache miss occurs when reading each segment of packet. We propose a cooperative mechanism of local memory and cache. In which the packet data and temporary data are stored into local memory and cache respectively. The analysis and experimental evaluation shows that the cooperative mechanism can improve the performance of network processors and reduce processing latency with little extra resources cost.


Author(s):  
Blanca Alicia Correa ◽  
Juan Fernando Eusse ◽  
Danny Munera ◽  
Jose Edinson Aedo ◽  
Juan Fernando Velez

Author(s):  
Yan-Wen Chen ◽  
Jeng-Jung Wang ◽  
Yan-Haw Chen ◽  
Chong-Dao Lee

In AES MixColumns operation, the branch number of circulant matrix is raised from 5 to 9 with 8´8 circulant matrices that can be enhancing the diffusion power. An efficient method to compute the circulant matrices in AES MixColumns transformation for speeding encryption is presented. Utilizing 8´8 involutory matrix multiplication is required 64 multiplications and 56 additions in in AES Mix-Columns transformation. We proposed the method with diversity 8´8 circulant matrices is only needed 19 multiplications and 57 additions. It is not only to encryption operations but also to decryption operations. Therefore, 8´8 circlant matrix operation with AES key sizes of 128bits, 192bits, and 256 bits are above 29.1%, 29.3%, and 29.8% faster than using 4´4 involutory matrix operation (16 multiplications, 12 additions), respectively. 8´8 circulant matrix encryption/decryption speed is above 78% faster than 8´8 involutory matrix operation. Ultimately, the proposed method for evaluating matrix multiplication can be made regular, simple and suitable for software implementations on embedded systems.


Sensors ◽  
2018 ◽  
Vol 18 (9) ◽  
pp. 3016 ◽  
Author(s):  
Yeşeren Saylan ◽  
Adil Denizli

Hemoglobin is an iron carrying protein in erythrocytes and also an essential element to transfer oxygen from the lungs to the tissues. Abnormalities in hemoglobin concentration are closely correlated with health status and many diseases, including thalassemia, anemia, leukemia, heart disease, and excessive loss of blood. Particularly in resource-constrained settings existing blood analyzers are not readily applicable due to the need for high-level instrumentation and skilled personnel, thereby inexpensive, easy-to-use, and reliable detection methods are needed. Herein, a molecular fingerprints of hemoglobin on a nanofilm chip was obtained for real-time, sensitive, and selective hemoglobin detection using a surface plasmon resonance system. Briefly, through the photopolymerization technique, a template (hemoglobin) was imprinted on a monomeric (acrylamide) nanofilm on-chip using a cross-linker (methylenebisacrylamide) and an initiator-activator pair (ammonium persulfate-tetramethylethylenediamine). The molecularly imprinted nanofilm on-chip was characterized by atomic force microscopy and ellipsometry, followed by benchmarking detection performance of hemoglobin concentrations from 0.0005 mg mL−1 to 1.0 mg mL−1. Theoretical calculations and real-time detection implied that the molecularly imprinted nanofilm on-chip was able to detect as little as 0.00035 mg mL−1 of hemoglobin. In addition, the experimental results of hemoglobin detection on the chip well-fitted with the Langmuir adsorption isotherm model with high correlation coefficient (0.99) and association and dissociation coefficients (39.1 mL mg−1 and 0.03 mg mL−1) suggesting a monolayer binding characteristic. Assessments on selectivity, reusability and storage stability indicated that the presented chip is an alternative approach to current hemoglobin-targeted assays in low-resource regions, as well as antibody-based detection procedures in the field. In the future, this molecularly imprinted nanofilm on-chip can easily be integrated with portable plasmonic detectors, improving its access to these regions, as well as it can be tailored to detect other proteins and biomarkers.


Author(s):  
Reinaldo Lucas dos Santos Rosa ◽  
Antonio Carlos Seabra

This chapter provides a guide for microfluidic devices development and optimization focused on chemical analysis applications, which includes medicine, biology, chemistry, and environmental monitoring, showing high-level performance associated with a specific functionality. Examples are chemical analysis, solid phase extraction, chromatography, immunoassay analysis, protein and DNA separation, cell sorting and manipulation, cellular biology, and mass spectrometry. In this chapter, most information is related to microfluidic devices design and fabrication used to perform several steps concerning chemical analysis, process preparation of reagents, samples reaction and detection, regarding water quality monitoring. These steps are especially relevant to lab-on-chip (LOC) and micro-total-analysis-systems (μTAS). μTAS devices are developed in order to simplify analytical chemist work, incorporating several analytical procedures into flow systems. In the case of miniaturized devices, the analysis time is reduced, and small volumes (nL) can be used.


2016 ◽  
Vol 6 (1) ◽  
pp. 79-90
Author(s):  
Łukasz Syrocki ◽  
Grzegorz Pestka

AbstractThe ready to use set of functions to facilitate solving a generalized eigenvalue problem for symmetric matrices in order to efficiently calculate eigenvalues and eigenvectors, using Compute Unified Device Architecture (CUDA) technology from NVIDIA, is provided. An integral part of the CUDA is the high level programming environment enabling tracking both code executed on Central Processing Unit and on Graphics Processing Unit. The presented matrix structures allow for the analysis of the advantages of using graphics processors in such calculations.


Sign in / Sign up

Export Citation Format

Share Document