Trends of Intel MIC Application In Bioinformatics

Author(s):  
Xinyi Wang ◽  
Cangshuai Wu ◽  
Zhen Huang
Keyword(s):  
Author(s):  
Miaoqing Huang ◽  
Chenggang Lai ◽  
Xuan Shi ◽  
Zhijun Hao ◽  
Haihang You

Coprocessors based on the Intel Many Integrated Core (MIC) Architecture have been adopted in many high-performance computer clusters. Typical parallel programming models, such as MPI and OpenMP, are supported on MIC processors to achieve the parallelism. In this work, we conduct a detailed study on the performance and scalability of the MIC processors under different programming models using the Beacon computer cluster. Our findings are as follows. (1) The native MPI programming model on the MIC processors is typically better than the offload programming model, which offloads the workload to MIC cores using OpenMP. (2) On top of the native MPI programming model, multithreading inside each MPI process can further improve the performance for parallel applications on computer clusters with MIC coprocessors. (3) Given a fixed number of MPI processes, it is a good strategy to schedule these MPI processes to as few MIC processors as possible to reduce the cross-processor communication overhead. (4) The hybrid MPI programming model, in which data processing is distributed to both MIC cores and CPU cores, can outperform the native MPI programming model.


2014 ◽  
Author(s):  
Jarno Mielikainen ◽  
Bormin Huang ◽  
Allen H. Huang
Keyword(s):  

2015 ◽  
Vol 2015 ◽  
pp. 1-14 ◽  
Author(s):  
Xinmin Tian ◽  
Hideki Saito ◽  
Serguei V. Preis ◽  
Eric N. Garcia ◽  
Sergey S. Kozhukhov ◽  
...  

Efficiently exploiting SIMD vector units is one of the most important aspects in achieving high performance of the application code running on Intel Xeon Phi coprocessors. In this paper, we present several effective SIMD vectorization techniques such as less-than-full-vector loop vectorization, Intel MIC specific alignment optimization, and small matrix transpose/multiplication 2D vectorization implemented in the Intel C/C++ and Fortran production compilers for Intel Xeon Phi coprocessors. A set of workloads from several application domains is employed to conduct the performance study of our SIMD vectorization techniques. The performance results show that we achieved up to 12.5x performance gain on the Intel Xeon Phi coprocessor. We also demonstrate a 2000x performance speedup from the seamless integration of SIMD vectorization and parallelization.


Author(s):  
Paul R. Woodward ◽  
Jagan Jayaraj ◽  
Pei-Hung Lin ◽  
Michael Knox ◽  
Simon D. Hammond ◽  
...  
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document