Trends of Intel MIC Application In Bioinformatics

Coprocessors based on the Intel Many Integrated Core (MIC) Architecture have been adopted in many high-performance computer clusters. Typical parallel programming models, such as MPI and OpenMP, are supported on MIC processors to achieve the parallelism. In this work, we conduct a detailed study on the performance and scalability of the MIC processors under different programming models using the Beacon computer cluster. Our findings are as follows. (1) The native MPI programming model on the MIC processors is typically better than the offload programming model, which offloads the workload to MIC cores using OpenMP. (2) On top of the native MPI programming model, multithreading inside each MPI process can further improve the performance for parallel applications on computer clusters with MIC coprocessors. (3) Given a fixed number of MPI processes, it is a good strategy to schedule these MPI processes to as few MIC processors as possible to reduce the cross-processor communication overhead. (4) The hybrid MPI programming model, in which data processing is distributed to both MIC cores and CPU cores, can outperform the native MPI programming model.

Download Full-text

Multi-Kepler GPU vs. multi-intel MIC: A two test case performance study

2014 International Conference on High Performance Computing & Simulation (HPCS) ◽

10.1109/hpcsim.2014.6903662 ◽

2014 ◽

Cited By ~ 4

Author(s):

Massimo Bernaschi ◽

Francesco Salvadore

Keyword(s):

Test Case ◽

Performance Study ◽

Intel Mic

Download Full-text

Optimizing zonal advection of the Advanced Research WRF (ARW) dynamics for Intel MIC

10.1117/12.2069314 ◽

2014 ◽

Cited By ~ 1

Author(s):

Jarno Mielikainen ◽

Bormin Huang ◽

Allen H. Huang

Keyword(s):

Zonal Advection ◽

Intel Mic

Download Full-text

Effective SIMD Vectorization for Intel Xeon Phi Coprocessors

Scientific Programming ◽

10.1155/2015/269764 ◽

2015 ◽

Vol 2015 ◽

pp. 1-14 ◽

Cited By ~ 8

Author(s):

Xinmin Tian ◽

Hideki Saito ◽

Serguei V. Preis ◽

Eric N. Garcia ◽

Sergey S. Kozhukhov ◽

...

Keyword(s):

High Performance ◽

Xeon Phi ◽

Performance Gain ◽

Intel Xeon Phi ◽

Performance Study ◽

Seamless Integration ◽

Small Matrix ◽

Performance Results ◽

Intel Mic ◽

Intel Xeon

Efficiently exploiting SIMD vector units is one of the most important aspects in achieving high performance of the application code running on Intel Xeon Phi coprocessors. In this paper, we present several effective SIMD vectorization techniques such as less-than-full-vector loop vectorization, Intel MIC specific alignment optimization, and small matrix transpose/multiplication 2D vectorization implemented in the Intel C/C++ and Fortran production compilers for Intel Xeon Phi coprocessors. A set of workloads from several application domains is employed to conduct the performance study of our SIMD vectorization techniques. The performance results show that we achieved up to 12.5x performance gain on the Intel Xeon Phi coprocessor. We also demonstrate a 2000x performance speedup from the seamless integration of SIMD vectorization and parallelization.

Download Full-text