Optimizing zonal advection of the Advanced Research WRF (ARW) dynamics for Intel MIC

Author(s):  
Jarno Mielikainen ◽  
Bormin Huang ◽  
Allen H. Huang
Keyword(s):  
2005 ◽  
Vol 18 (21) ◽  
pp. 4454-4473 ◽  
Author(s):  
Renguang Wu ◽  
Ben P. Kirtman

Abstract Equatorial Pacific sea surface temperature (SST) anomalies in the Center for Ocean–Land–Atmosphere Studies (COLA) interactive ensemble coupled general circulation model show near-annual variability as well as biennial El Niño–Southern Oscillation (ENSO) variability. There are two types of near-annual modes: a westward propagating mode and a stationary mode. For the westward propagating near-annual mode, warm SST anomalies are generated in the eastern equatorial Pacific in boreal spring and propagate westward in boreal summer. Consistent westward propagation is seen in precipitation, surface wind, and ocean current. For the stationary near-annual mode, warm SST anomalies develop near the date line in boreal winter and decay locally in boreal spring. Westward propagation of warm SST anomalies also appears in the developing year of the biennial ENSO mode. However, warm SST anomalies for the westward propagating near-annual mode occur about two months earlier than those for the biennial ENSO mode and are quickly replaced by cold SST anomalies, whereas warm SST anomalies for the biennial ENSO mode only experience moderate weakening. Anomalous zonal advection contributes to the generation and westward propagation of warm SST anomalies for both the westward propagating near-annual mode and the biennial ENSO mode. However, the role of mean upwelling is markedly different. The mean upwelling term contributes to the generation of warm SST anomalies for the biennial ENSO mode, but is mainly a damping term for the westward propagating near-annual mode. The development of warm SST anomalies for the stationary near-annual mode is partially due to anomalous zonal advection and upwelling, similar to the amplification of warm SST anomalies in the equatorial central Pacific for the biennial ENSO mode. The mean upwelling term is negative in the eastern equatorial Pacific for the stationary near-annual mode, which is opposite to the ENSO mode. The development of cold SST anomalies in the aftermath of warm SST anomalies for the westward propagating near-annual mode is coupled to large easterly wind anomalies, which occur between the warm and cold SST anomalies. The easterly anomalies contribute to the cold SST anomalies through anomalous zonal, meridional, and vertical advection and surface evaporation. The cold SST anomalies, in turn, enhance the easterly anomalies through a Rossby-wave-type response. The above processes are most effective during boreal spring when the mean near-surface-layer ocean temperature gradient is the largest. It is suggested that the westward propagating near-annual mode is related to air–sea interaction processes that are limited to the near-surface layers.


Author(s):  
Miaoqing Huang ◽  
Chenggang Lai ◽  
Xuan Shi ◽  
Zhijun Hao ◽  
Haihang You

Coprocessors based on the Intel Many Integrated Core (MIC) Architecture have been adopted in many high-performance computer clusters. Typical parallel programming models, such as MPI and OpenMP, are supported on MIC processors to achieve the parallelism. In this work, we conduct a detailed study on the performance and scalability of the MIC processors under different programming models using the Beacon computer cluster. Our findings are as follows. (1) The native MPI programming model on the MIC processors is typically better than the offload programming model, which offloads the workload to MIC cores using OpenMP. (2) On top of the native MPI programming model, multithreading inside each MPI process can further improve the performance for parallel applications on computer clusters with MIC coprocessors. (3) Given a fixed number of MPI processes, it is a good strategy to schedule these MPI processes to as few MIC processors as possible to reduce the cross-processor communication overhead. (4) The hybrid MPI programming model, in which data processing is distributed to both MIC cores and CPU cores, can outperform the native MPI programming model.


2013 ◽  
Vol 70 (1) ◽  
pp. 187-192 ◽  
Author(s):  
Adam Sobel ◽  
Eric Maloney

Abstract The authors discuss modifications to a simple linear model of intraseasonal moisture modes. Wind–evaporation feedbacks were shown in an earlier study to induce westward propagation in an eastward mean low-level flow in this model. Here additional processes, which provide effective sources of moist static energy to the disturbances and which also depend on the low-level wind, are considered. Several processes can act as positive sources in perturbation easterlies: zonal advection (if the mean zonal moisture gradient is eastward), modulation of synoptic eddy drying by the MJO-scale wind perturbations, and frictional convergence. If the sum of these is stronger than the wind–evaporation feedback—as observations suggest may be the case, though with considerable uncertainty—the model produces unstable modes that propagate weakly eastward relative to the mean flow. With a small amount of horizontal diffusion or other scale-selective damping, the growth rate is greatest at the largest horizontal scales and decreases monotonically with wavenumber.


2015 ◽  
Vol 2015 ◽  
pp. 1-14 ◽  
Author(s):  
Xinmin Tian ◽  
Hideki Saito ◽  
Serguei V. Preis ◽  
Eric N. Garcia ◽  
Sergey S. Kozhukhov ◽  
...  

Efficiently exploiting SIMD vector units is one of the most important aspects in achieving high performance of the application code running on Intel Xeon Phi coprocessors. In this paper, we present several effective SIMD vectorization techniques such as less-than-full-vector loop vectorization, Intel MIC specific alignment optimization, and small matrix transpose/multiplication 2D vectorization implemented in the Intel C/C++ and Fortran production compilers for Intel Xeon Phi coprocessors. A set of workloads from several application domains is employed to conduct the performance study of our SIMD vectorization techniques. The performance results show that we achieved up to 12.5x performance gain on the Intel Xeon Phi coprocessor. We also demonstrate a 2000x performance speedup from the seamless integration of SIMD vectorization and parallelization.


Sign in / Sign up

Export Citation Format

Share Document