scholarly journals AtTune: A Heuristic based Framework for Parallel Applications Autotuning

Author(s):  
Hiago Rocha ◽  
Janaina Schwarzrock ◽  
Monica Pereira ◽  
Lucas Schnorr ◽  
Philippe Navaux ◽  
...  

Several aspects limit the scalability of parallel applications, e.g., off-chip bus saturation and data synchronization. Moreover, the high cost of cooling HPC systems, which can outweigh the cost of developing the system itself, has pushed the parallel application’s execution to another level of requirements, in terms of performance and energy. In this work, we propose AtTune: a heuristic-based framework for tuning the number of processes/threads and CPU frequency to optimize the parallel applications’ execution. AtTune is transparent for the user, independent of the input size, and it optimizes for different parallel programming models. We evaluated our proposed solution considering five well-known kernels implemented in MPI and OpenMP. Experimental results on two real multi-core systems showed that AtTune improves up to 36%, 11%, and 32% the energy efficiency, performance, and Energy-Delay Product, respectively.

Author(s):  
Miaoqing Huang ◽  
Chenggang Lai ◽  
Xuan Shi ◽  
Zhijun Hao ◽  
Haihang You

Coprocessors based on the Intel Many Integrated Core (MIC) Architecture have been adopted in many high-performance computer clusters. Typical parallel programming models, such as MPI and OpenMP, are supported on MIC processors to achieve the parallelism. In this work, we conduct a detailed study on the performance and scalability of the MIC processors under different programming models using the Beacon computer cluster. Our findings are as follows. (1) The native MPI programming model on the MIC processors is typically better than the offload programming model, which offloads the workload to MIC cores using OpenMP. (2) On top of the native MPI programming model, multithreading inside each MPI process can further improve the performance for parallel applications on computer clusters with MIC coprocessors. (3) Given a fixed number of MPI processes, it is a good strategy to schedule these MPI processes to as few MIC processors as possible to reduce the cross-processor communication overhead. (4) The hybrid MPI programming model, in which data processing is distributed to both MIC cores and CPU cores, can outperform the native MPI programming model.


2018 ◽  
Vol 84 ◽  
pp. 22-31 ◽  
Author(s):  
Adrián Castelló ◽  
Rafael Mayo ◽  
Kevin Sala ◽  
Vicenç Beltran ◽  
Pavan Balaji ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document