scholarly journals Performance Reduction for Automatic Development of Parallel Applications for Reconfigurable Computer Systems

2020 ◽  
Vol 7 (2) ◽  
Author(s):  
A. I. Dordopulo

In this paper, we review and compare the methods of parallel applications’ development based on the automatic program parallelizing for computer systems with shared and distributed memory and on the information graph’s hardware costs and performance reduction for reconfigurable computer systems. The increase in the number of computer system’s units or in the problem’s dimension leads to the significant growth of the automatic parallelization complexity for a procedural program. As a result, the obtainment of parallelizing results in acceptable time using state-of-the-art computer systems is very problematic. In reconfigurable computer systems, the reduction of absolutely parallel information graph of a problem is applied for the parallel program creation. The information graph illustrates the parallelizing and pipelining of computations. In addition to the traditionally practiced reduction of basic subgraphs’ number, the reductions of computational operations’ quantity and of data digit capacity can be utilized for the performance or hardware costs’ scaling. We have proved that the methods of information graph hardware costs and performance reduction provide a considerable decrease in the number of steps needed for adaptation of parallel application to reconfigurable computer systems’ architectures in comparison with automatic parallelizing. We have proved the theorem of coefficient value at sequential reduction, the theorem of increase in reduction coefficient at custom value and the theorem of commutativity of various reduction transformations. The proved theorems help to find a rational sequence of reduction transformations.


2019 ◽  
Author(s):  
Arthur Krause ◽  
Francis Moreira ◽  
Valéria Girelli ◽  
Philippe Olivier Navaux

Conforme os processadores evoluem, o desempenho dos sistemas computacionais se torna cada vez mais limitado pelo tempo de acesso à memória. Caches são empregadas a fim de contornar este problema, mas é necessária uma gerência inteligente dos dados que são armazenados nelas para impedir que problemas como poluição e thrashing degradem seu desempenho. Neste trabalho é apresentada uma análise da poluição de cache e thrashing em aplicações paralelas de alto desempenho. Os resultados mostram que caches com maior associatividade sofrem mais com estes problemas. Até 28% dos cache misses na L1 poderiam ser evitados com uma política de substituição de cache mais inteligente, chegando a até 62% na cache L2 e 98% na LLC. As processors evolve, the performance of computer systems becomes increasingly limited by the memory access time. Caches are employed in order to get around this problem, but an intelligent management of the data that is stored in them is necessary to prevent problems such as pollution and thrashing from degrading their performance. In this work, an analysis of cache and thrashing pollution in high performance parallel applications is presented. The results show that caches with greater associativity suffer more from these problems. Up to 28% of cache misses in the L1 cache could be avoided with a smarter replacement policy, up to 62% in the L2 cache and 98% in the LLC.


Author(s):  
И.И. Левин ◽  
А.И. Дордопуло

Рассмотрена оригинальная методика отображения информационного графа прикладной программы на архитектуру реконфигурируемой вычислительной системы с помощью методов редукции производительности, обеспечивающих решение задач, аппаратные затраты на реализацию которых превышают доступный вычислительный ресурс. Доказаны теоремы о свойствах последовательного применения редукций по числу базовых подграфов, по числу вычислительных устройств и разрядности. На основе доказанных теорем и следствий из них сформулирована методика редукционных преобразований информационного графа прикладной программы для автоматической адаптации к архитектуре реконфигурируемой вычислительной системы. Приведена оценка максимального числа преобразований согласно предложенной методике для сбалансированной редукции производительности и аппаратных затрат прикладных программ для реконфигурируемых вычислительных систем. To solve applied problems, the hardware costs of which exceed the available computing resource of FPGA-based computer systems, an original technique was developed for mapping the information graph of an application program to the architecture of a reconfigurable computing system. The proposed technique is based on the performance reduction methods that reduce the productivity of an applied task, which, along with the reducing productivity, does so for the hardware costs of its implementation and, thereby, solve the problem on the available computing resource. We demonstrate that the decrease in hardware costs for the computing structure realization occurs only during the reduction the basic subgraph number, the number of computing devices in a basic subgraph and the data width. The influence of sequential reduction transformations on the computing structure of a problem is examined. The proved theorems are concerned with the possibility of representing the reduction coefficient as a product of the coefficients of successive reductions, on the inability of additive increase in reduction coefficient during sequential reductions and on the superposition commutativity of different sequential reductions. The proved theorems and the corollaries presented in the article allow formulating the basic principles for the method of reduction transformations of the information graph of the problem for adaptation to the architecture of a hybrid reconfigurable computing system. A distinctive feature of the technique is a relatively small number of transformations for a balanced reduction of the information graph of the problem and the implementation of the task on a reconfigurable computer system.The comparatively small number of transformations required for the balanced reduction of the information graph of the problem and for the implementation of calculations on a reconfigurable computer system is the distinctive feature of the technique. For the developed technique, we estimated the maximal number of transformations and found out the decrease in the quantity of analyzed reduction variants from each class. The proposed technique permits the significant reduction of the time needed to create the computational structure of a parallel program adapted to the architecture and configuration of the reconfigurable computing system. Furthermore, the technique allows automatization of this process using the specialized software and providing at least 5075 efficiency in comparison with the solutions of the same problems by specialists.


Author(s):  
Alexey Igorevich Dordopulo ◽  
Vasiliy Borisovich Kovalenko ◽  
Viacheslav Alexandrovich Gudkov ◽  
Liubov Mikhailovna Slasten

1988 ◽  
Vol 108 (4) ◽  
pp. 260-267
Author(s):  
Kazuo Takaragi ◽  
Ryoichi Sasaki ◽  
Yasuhiko Nagai

2012 ◽  
Vol 17 (4) ◽  
pp. 207-216 ◽  
Author(s):  
Magdalena Szymczyk ◽  
Piotr Szymczyk

Abstract The MATLAB is a technical computing language used in a variety of fields, such as control systems, image and signal processing, visualization, financial process simulations in an easy-to-use environment. MATLAB offers "toolboxes" which are specialized libraries for variety scientific domains, and a simplified interface to high-performance libraries (LAPACK, BLAS, FFTW too). Now MATLAB is enriched by the possibility of parallel computing with the Parallel Computing ToolboxTM and MATLAB Distributed Computing ServerTM. In this article we present some of the key features of MATLAB parallel applications focused on using GPU processors for image processing.


Sign in / Sign up

Export Citation Format

Share Document