parallel programming model
Recently Published Documents


TOTAL DOCUMENTS

119
(FIVE YEARS 13)

H-INDEX

9
(FIVE YEARS 1)

Author(s):  
Olfa Hamdi-Larbi ◽  
Ichrak Mehrez ◽  
Thomas Dufaud

Many applications in scientific computing process very large sparse matrices on parallel architectures. The presented work in this paper is a part of a project where our general aim is to develop an auto-tuner system for the selection of the best matrix compression format in the context of high-performance computing. The target smart system can automatically select the best compression format for a given sparse matrix, a numerical method processing this matrix, a parallel programming model and a target architecture. Hence, this paper describes the design and implementation of the proposed concept. We consider a case study consisting of a numerical method reduced to the sparse matrix vector product (SpMV), some compression formats, the data parallel as a programming model and, a distributed multi-core platform as a target architecture. This study allows extracting a set of important novel metrics and parameters which are relative to the considered programming model. Our metrics are used as input to a machine-learning algorithm to predict the best matrix compression format. An experimental study targeting a distributed multi-core platform and processing random and real-world matrices shows that our system can improve in average up to 7% the accuracy of the machine learning.


Electronics ◽  
2021 ◽  
Vol 10 (21) ◽  
pp. 2681
Author(s):  
Joonmoo Huh ◽  
Deokwoo Lee

Shared memory is the most popular parallel programming model for multi-core processors, while message passing is generally used for large distributed machines. However, as the number of cores on a chip increases, the relative merits of shared memory versus message passing change, and we argue that message passing becomes a viable, high performing, and parallel programming model. To demonstrate this hypothesis, we compare a shared memory architecture with a new message passing architecture on a suite of applications tuned for each system independently. Perhaps surprisingly, the fundamental behaviors of the applications studied in this work, when optimized for both models, are very similar to each other, and both could execute efficiently on multicore architectures despite many implementations being different from each other. Furthermore, if hardware is tuned to support message passing by supporting bulk message transfer and the elimination of unnecessary coherence overheads, and if effective support is available for global operations, then some applications would perform much better on a message passing architecture. Leveraging our insights, we design a message passing architecture that supports both memory-to-memory and cache-to-cache messaging in hardware. With the new architecture, message passing is able to outperform its shared memory counterparts on many of the applications due to the unique advantages of the message passing hardware as compared to cache coherence. In the best case, message passing achieves up to a 34% increase in speed over its shared memory counterpart, and it achieves an average 10% increase in speed. In the worst case, message passing is slowed down in two applications—CG (conjugate gradient) and FT (Fourier transform)—because it could not perform well on the unique data sharing patterns as its counterpart of shared memory. Overall, our analysis demonstrates the importance of considering message passing as a high performing and hardware-supported programming model on future multicore architectures.


2021 ◽  
Vol 17 (2) ◽  
pp. 145-158
Author(s):  
Ahmad Qawasmeh ◽  
Salah Taamneh ◽  
Ashraf H. Aljammal ◽  
Nabhan Hamadneh ◽  
Mustafa Banikhalaf ◽  
...  

Different high performance techniques, such as profiling, tracing, and instrumentation, have been used to tune and enhance the performance of parallel applications. However, these techniques do not show how to explore the potential of parallelism in a given application. Animating and visualizing the execution process of a sequential algorithm provide a thorough understanding of its usage and functionality. In this work, an interactive web-based educational animation tool was developed to assist users in analyzing sequential algorithms to detect parallel regions regardless of the used parallel programming model. The tool simplifies algorithms’ learning, and helps students to analyze programs efficiently. Our statistical t-test study on a sample of students showed a significant improvement in their perception of the mechanism and parallelism of applications and an increase in their willingness to learn algorithms and parallel programming.


2021 ◽  
Vol 24 (1) ◽  
pp. 157-183
Author(s):  
Никита Андреевич Катаев

Automation of parallel programming is important at any stage of parallel program development. These stages include profiling of the original program, program transformation, which allows us to achieve higher performance after program parallelization, and, finally, construction and optimization of the parallel program. It is also important to choose a suitable parallel programming model to express parallelism available in a program. On the one hand, the parallel programming model should be capable to map the parallel program to a variety of existing hardware resources. On the other hand, it should simplify the development of the assistant tools and it should allow the user to explore the parallel program the assistant tools generate in a semi-automatic way. The SAPFOR (System FOR Automated Parallelization) system combines various approaches to automation of parallel programming. Moreover, it allows the user to guide the parallelization if necessary. SAPFOR produces parallel programs according to the high-level DVMH parallel programming model which simplify the development of efficient parallel programs for heterogeneous computing clusters. This paper focuses on the approach to semi-automatic parallel programming, which SAPFOR implements. We discuss the architecture of the system and present the interactive subsystem which is useful to guide the SAPFOR through program parallelization. We used the interactive subsystem to parallelize programs from the NAS Parallel Benchmarks in a semi-automatic way. Finally, we compare the performance of manually written parallel programs with programs the SAPFOR system builds.


2020 ◽  
Vol 23 (4) ◽  
pp. 866-886
Author(s):  
Vladimir Aleksandrovich Bakhtin ◽  
Dmitry Aleksandrovich Zakharov ◽  
Aleksandr Aleksandrovich Ermichev ◽  
Victor Alekseevich Krukov

DVM-system is designed for the development of parallel programs of scientific and technical calculations in the C-DVMH and Fortran-DVMH languages. These languages use a single DVMH-model of parallel programming model and are an extension of the standard C and Fortran languages with parallelism specifications in the form of compiler directives. The DVMH model makes it possible to create efficient parallel programs for heterogeneous computing clusters, in the nodes of which accelerators, graphic processors or Intel Xeon Phi coprocessors can be used as computing devices along with universal multi-core processors. The article describes the method of debugging parallel programs in DVM-system, as well as new features of DVM-debugger.


2020 ◽  
Vol 1566 ◽  
pp. 012070
Author(s):  
Junseok Cheon ◽  
Yeoneo Kim ◽  
Taekwang Hur ◽  
Sugwoo Byun ◽  
Gyun Woo

2020 ◽  
Vol 23 (4) ◽  
pp. 594-614
Author(s):  
Vladimir Aleksandrovich Bakhtin ◽  
Dmitry Aleksandrovich Zakharov ◽  
Andrey Nikolaevich Kozlov ◽  
Veniamin Sergeevich Konovalov

DVM-system is designed for the development of parallel programs of scientific and technical calculations in the C-DVMH and Fortran-DVMH languages. These languages use a single DVMH-model of parallel programming model and are an extension of the standard C and Fortran languages with parallelism specifications in the form of compiler directives. The DVMH model makes it possible to create efficient parallel programs for heterogeneous computing clusters, in the nodes of which accelerators, graphic processors or Intel Xeon Phi coprocessors can be used as computing devices along with universal multi-core processors. The article describes the experience of the successful using of DVM-system to develop a parallel software code for calculating the problem of radiation magnetic gas dynamics and for research of plasma dynamics in the QSPA channel.


2020 ◽  
Vol 23 (3) ◽  
pp. 247-270
Author(s):  
Valery Fedorovich Aleksahin ◽  
Vladimir Aleksandrovich Bakhtin ◽  
Olga Fedorovna Zhukova ◽  
Dmitry Aleksandrovich Zakharov ◽  
Victor Alekseevich Krukov ◽  
...  

DVM-system is designed for the development of parallel programs of scientific and technical calculations in the C-DVMH and Fortran-DVMH languages. These languages use a single DVMH-model of parallel programming model and are an extension of the standard C and Fortran languages with parallelism specifications in the form of compiler directives. The DVMH model makes it possible to create efficient parallel programs for heterogeneous computing clusters, in the nodes of which accelerators, graphic processors or Intel Xeon Phi coprocessors can be used as computing devices along with universal multi-core processors. The article presents new features of DVM-system that have been developed recently.


2020 ◽  
Vol 105 ◽  
pp. 993-1001 ◽  
Author(s):  
Jianjiang Li ◽  
Yajun Liu ◽  
Jian Pan ◽  
Peng Zhang ◽  
Wei Chen ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document