Performance portable parallel programming of heterogeneous stencils across shared-memory platforms with modern Intel processors

Author(s):  
Lukasz Szustak ◽  
Pawel Bratek

In this work, we take up the challenge of performance portable programming of heterogeneous stencil computations across a wide range of modern shared-memory systems. An important example of such computations is the Multidimensional Positive Definite Advection Transport Algorithm (MPDATA), the second major part of the dynamic core of the EULAG geophysical model. For this aim, we develop a set of parametric optimization techniques and four-step procedure for customization of the MPDATA code. Among these techniques are: islands-of-cores strategy, (3+1)D decomposition, exploiting data parallelism and simultaneous multithreading, data flow synchronization, and vectorization. The proposed adaptation methodology helps us to develop the automatic transformation of the MPDATA code to achieve high sustained scalable performance for all tested ccNUMA platforms with Intel processors of last generations. This means that for a given platform, the sustained performance of the new code is kept at a similar level, independently of the problem size. The highest performance utilization rate of about 41–46% of the theoretical peak, measured for all benchmarks, is provided for any of the two-socket servers based on Skylake-SP (SKL-SP), Broadwell, and Haswell CPU architectures. At the same time, the four-socket server with SKL-SP processors achieves the highest sustained performance of around 1.0–1.1 Tflop/s that corresponds to about 33% of the peak.

Author(s):  
Bei Wang ◽  
Stephane Ethier ◽  
William Tang ◽  
Khaled Z Ibrahim ◽  
Kamesh Madduri ◽  
...  

The gyrokinetic toroidal code at Princeton (GTC-P) is a highly scalable and portable particle-in-cell (PIC) code. It solves the 5-D Vlasov–Poisson equation featuring efficient utilization of modern parallel computer architectures at the petascale and beyond. Motivated by the goal of developing a modern code capable of dealing with the physics challenge of increasing problem size with sufficient resolution, new thread-level optimizations have been introduced as well as a key additional domain decomposition. GTC-P’s multiple levels of parallelism, including internode 2-D domain decomposition and particle decomposition, as well as intranode shared memory partition and vectorization, have enabled pushing the scalability of the PIC method to extreme computational scales. In this article, we describe the methods developed to build a highly parallelized PIC code across a broad range of supercomputer designs. This particularly includes implementations on heterogeneous systems using NVIDIA GPU accelerators and Intel Xeon Phi (MIC) coprocessors and performance comparisons with state-of-the-art homogeneous HPC systems such as Blue Gene/Q. New discovery science capabilities in the magnetic fusion energy application domain are enabled, including investigations of ion–temperature–gradient driven turbulence simulations with unprecedented spatial resolution and long temporal duration. Performance studies with realistic fusion experimental parameters are carried out on multiple supercomputing systems spanning a wide range of cache capacities, cache-sharing configurations, memory bandwidth, interconnects, and network topologies. These performance comparisons using a realistic discovery-science-capable domain application code provide valuable insights on optimization techniques across one of the broadest sets of current high-end computing platforms worldwide.


1998 ◽  
Vol 38 (8-9) ◽  
pp. 443-451 ◽  
Author(s):  
S. H. Hyun ◽  
J. C. Young ◽  
I. S. Kim

To study propionate inhibition kinetics, seed cultures for the experiment were obtained from a propionate-enriched steady-state anaerobic Master Culture Reactor (MCR) operated under a semi-continuous mode for over six months. The MCR received a loading of 1.0 g propionate COD/l-day and was maintained at a temperature of 35±1°C. Tests using serum bottle reactors consisted of four phases. Phase I tests were conducted for measurement of anaerobic gas production as a screening step for a wide range of propionate concentrations. Phase II was a repeat of phase I but with more frequent sampling and detailed analysis of components in the liquid sample using gas chromatography. In phase III, different concentrations of acetate were added along with 1.0 g propionate COD/l to observe acetate inhibition of propionate degradation. Finally in phase IV, different concentrations of propionate were added along with 100 and 200 mg acetate/l to confirm the effect of mutual inhibition. Biokinetic and inhibition coefficients were obtained using models of Monod, Haldane, and Han and Levenspiel through the use of non-linear curve fitting technique. Results showed that the values of kp, maximum propionate utilization rate, and Ksp, half-velocity coefficient for propionate conversion, were 0.257 mg HPr/mg VSS-hr and 200 mg HPr/l, respectively. The values of kA, maximum acetate utilization rate, and KsA, half-velocity coefficient for acetate conversion, were 0.216 mg HAc/mg VSS-hr and 58 mg HAc/l, respectively. The results of phase III and IV tests indicated there was non-competitive inhibition when the acetate concentration in the reactor exceeded 200 mg/l.


2021 ◽  
Vol 1 ◽  
pp. 1529-1536
Author(s):  
Mohammad Reza Dastmalchi ◽  
Bimal Balakrishnan ◽  
Danielle Oprean

AbstractTeam collaboration is a critical necessity of the modern-day engineering design profession. This is no surprise given that teams typically possess more task-relevant skills and knowledge than individuals (Levine & Choi, 2004). Advancements in digital media provide new opportunities for collaboration across the design lifecycle. However, early stages of the design process still pose challenges to digitally mediated design collaboration due to greater representational abstraction and the presence of multiple modalities for design ideation. Usually, design teams spend a substantial amount of time generating a broad set of ideas that can lead them to a wide range of design solutions during the ideation phase. However, sooner or later, teams should narrow down their vision for a final solution. What factors influence team members to eliminate or select an idea? Our study is an attempt to demonstrate some examples of this challenge. By drawing on research in team cognition, particularly the concept of transactive memory system (TMS) we studied a design teams' communication and media use during the ideation phase. The goal was to see if media type and communication modes can predict a team's decisions on selecting and eliminating ideas.


Author(s):  
Isaac Sánchez Barrera ◽  
Miquel Moretó ◽  
Eduard Ayguadé ◽  
Jesús Labarta ◽  
Mateo Valero ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document